[Aptitude-svn-commit] r3260 - in branches/aptitude-0.3/aptitude: . src/generic/problemresolver

Wed, 04 May 2005 17:51:18 +0000

Author: dburrows
Date: Wed May  4 17:51:15 2005
New Revision: 3260

Modified:
   branches/aptitude-0.3/aptitude/ChangeLog
   branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex
Log:
Significant edits, minor addition.

Modified: branches/aptitude-0.3/aptitude/ChangeLog
==============================================================================

--- branches/aptitude-0.3/aptitude/ChangeLog	(original)
+++ branches/aptitude-0.3/aptitude/ChangeLog	Wed May  4 17:51:15 2005
@@ -1,3 +1,9 @@
+2005-05-04  Daniel Burrows  <dburrows@debian.org>
+
+	* src/generic/problemresolver/model.tex:
+
+	Edit a bunch of stuff in the document, and write up that second proof.
+
 2005-05-01  Daniel Burrows  <dburrows@debian.org>
 
 	* doc/en/aptitude.xml:

Modified: branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex
==============================================================================
--- branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex	(original)
+++ branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex	Wed May  4 17:51:15 2005
@@ -5,7 +5,10 @@
 \usepackage{pgf}
 \usepackage{amsmath}
 \usepackage{amsthm}
+\usepackage{amssymb}
 \usepackage{semantic}
+\usepackage{mathpartir}
+\usepackage{varioref}
 
 \pagestyle{fancy}
 
@@ -39,9 +42,13 @@
 \newcommand{\idist}[2]{\langle#1,#2\rangle}
 \newcommand{\len}[1]{\lvert#1\rvert}
 
-\newcommand{\nsol}[1]{\overset{#1}{\Rightarrow}}
+\newcommand{\nsol}[2]{\overset{#1}{\underset{#2}{\Rightarrow}}}
 \newcommand{\nsolmany}[1]{\overset{#1}{\underset{*}{\Rightarrow}}}
 
+\newcommand{\installs}{\vartriangleright}
+\newcommand{\satisfies}{\vdash}
+\newcommand{\col}{\mathpunct{:}}
+
 \mathlig{->}{\rightarrow}
 \mathlig{=>*}{\overset{*}{=>}}
 
@@ -212,13 +219,14 @@
 to find solutions to dependency problems, and to cause algorithms that
 manipulate dependencies to become horribly messy.  In situations like
 this, it is often a good idea to find a simpler, more mathematical
-model of the problem to be analyzed.  Of course, it is well-known that
+model of the problem being analyzed.  Of course, it is well-known that
 package dependencies can be reduced to the satisfaction of Boolean
 equations, but such a reduction is arguably \emph{too} extreme: it
 certainly results in a mathematical model, but the model it produces
-hides the structure of the original problem.  I will instead propose a
-model which is sufficient to describe any dependency problem (at least
-in Debian) and retains the structure of a package system.
+hides the structure of the original problem.  The following section
+describes an alternate model which is sufficient to capture any
+dependency problem (at least in Debian) and retains the structure of a
+package system.
 
 \subsection{Basic Concepts}
 
@@ -231,7 +239,7 @@
 $\pkgof{v}$.
 
 To represent the state of the entire system, the following sets are
-available:
+defined:
 
 \begin{itemize}
 \item $\P$ is the set of all packages.
@@ -241,6 +249,9 @@
     asserted in the current package system.}
 \end{itemize}
 
+Throughout this paper, I will assume that $\P$ and $\V$ (and hence
+$\D$) are finite.
+
 \subsection{Reduction of Debian Dependencies to the Model}
 
 As claimed above, it is possible to reduce a Debian dependency system
@@ -253,7 +264,7 @@
   additional version for each package.  This version represents the
   state of the package when it is not installed.  Versions
   corresponding to versions of the Debian package are indicated by $p
-  : n$ where $n$ is the version number, while the ``uninstalled''
+  \col n$ where $n$ is the version number, while the ``uninstalled''
   version is indicated by $p_u$.
 \item For each dependency of the version $v$ of a package on
   $\pkg{A}_1\ |\ \dots$, accumulate a set $S$ containing all the
@@ -265,7 +276,7 @@
   versions $3.1$, $3.2$, and $3.3$ of package \pkg{A} are available,
   versions $1$, $2$, and $3$ of package \pkg{B} are available, and
   package \pkg{C} version $3.14$ provides \pkg{B}, then
-  $S=\{\pkg{A}:3.2,\pkg{A}:3.3,\pkg{B}:1,\pkg{B}:2,B:3,\pkg{C}:3.14\}$.
+  $S=\{\pkg{A} \col 3.2,\pkg{A} \col 3.3,\pkg{B} \col 1,\pkg{B} \col 2,B \col 3,\pkg{C} \col 3.14\}$.
 
   $\D$ contains $v -> S$ for every such dependency.
 \item For each conflict declared by a version $v$ on the package $p$,
@@ -279,7 +290,7 @@
   For instance, if $v$ conflicts with \pkg{A}, of which versions 3.2
   and 3.3 are available, versions 2 and 3 of \pkg{B} provide \pkg{A},
   and no other versions of \pkg{B} are available, then
-  $S=\{\pkg{A}:3.2,\pkg{A}:3.3,\pkg{A}:\pkg{UNINST},\pkg{B}:2,\pkg{B}:\pkg{UNINST}\}$.
+  $S=\{\pkg{A} \col 3.2,\pkg{A} \col 3.3,\pkg{A} \col \pkg{UNINST},\pkg{B} \col 2,\pkg{B} \col \pkg{UNINST}\}$.
 
   \begin{note}
     In reality, extra care must be taken to screen out self-conflicts
@@ -311,10 +322,13 @@
 \end{definition}
 
 \begin{definition}
-  An installation $I$ is \emph{consistent} if it satisfies each
-  dependency in the world: that is, for all $v -> \{v_1',\dots\}$ in
-  $\D$, either $I \not \installs v$ or $I \installs v_i'$ for some
-  $i$.
+  An installation $I$ satisfies a dependency $d=v -> S$ if either $I
+  \not \installs v$ or $I \installs v'$ for some $v' \in S$.
+\end{definition}
+
+\begin{definition}
+  An installation $I$ is \emph{consistent} if $I \satisfies d$ for all
+  $d \in \D$.
 \end{definition}
 
 \begin{definition}
@@ -328,6 +342,13 @@
       v, & p' = p
     \end{cases}
   \end{equation}
+
+  As a shorthand, the following notation indicates that a particular
+  version of a package is to be installed:
+
+  \begin{equation}
+    I;v = I;\pkgof(v) \mapsto v
+  \end{equation}
 \end{definition}
 
 \section{The Dependency Resolution Problem}
@@ -336,11 +357,11 @@
 
 Let $I_0$ be an inconsistent installation.  We would like to find a
 consistent installation that is ``similar to'' $I_0$.  This is the
-dependency resolution problem.  In a real package system, it
+dependency resolution problem.  In a real package manager, it
 corresponds to a situation in which the user's requests have resulted
 in an invalid state (with unsatisfied dependencies or conflicts); the
-goal of a package manager in this situation is to find a ``small'' and
-``good'' set of changes that result in a valid state.
+goal of the package manager in this situation is to find a ``small''
+and ``good'' set of changes that result in a valid state.
 
 \begin{note}
   This problem is poorly defined: ``small'' and ``good'' are not
@@ -377,43 +398,45 @@
 
   Create one package for each variable and for each clause in the SAT
   problem.  For each variable $x$, let the versions of the
-  corresponding package be $x:0$ and $x:1$; for each clause, create
-  exactly one version.  For each SAT clause let $v_c$ be the package
-  corresponding to the clause, and insert $v_c -> S$ into $\D$, where
-  for each literal of a variable $x$ appearing in the clause, $S$
-  contains $x:0$ if $x$ is a negative literal and $x:1$ if $x$ is a
-  positive literal.  This reduction is clearly polynomial-time; I
-  claim that a solution to this set of dependencies exists if and only
-  if a solution to the corresponding SAT problem exists.
+  corresponding package be $x \col 0$ and $x \col 1$; for each clause,
+  create exactly one version.  For each SAT clause let $v_c$ be the
+  package corresponding to the clause, and insert $v_c -> S$ into
+  $\D$, where for each literal of a variable $x$ appearing in the
+  clause, $S$ contains $x \col 0$ if $x$ is a negative literal and $x
+  \col 1$ if $x$ is a positive literal.  This reduction is clearly
+  polynomial-time; I claim that a solution to this set of dependencies
+  exists if and only if a solution to the corresponding SAT problem
+  exists.
 
   Suppose that there is an assignment that solves the SAT problem.
   Define an installation $I$ as follows: if $p$ corresponds to a
   clause, $I(p)$ is the single version of $p$; if $p$ corresponds to a
-  variable $x$, $I(p)=x:0$ if $x$ is FALSE in the SAT solution and
-  $x:1$ if $x$ is TRUE in the SAT solution.  Now, consider any
-  dependency $v -> S$.  From the construction above, $S$ corresponds
-  to a clause of the SAT instance.  At least one literal in this
-  clause must be assigned the value TRUE (otherwise the clause is not
-  satisfied); let $x$ be the corresponding variable.  If the literal
-  is positive, then (by construction) $S$ contains $x:1$ and (since
-  $x$ must be assigned the value TRUE) $I(p)=x:1$.  On the other hand,
-  if the literal is negative, then $S$ contains $x:0$ and $I(p)=x:0$.
-  Thus, $I$ is a consistent installation.
+  variable $x$, $I(p)=x \col 0$ if $x$ is FALSE in the SAT solution
+  and $I(p)=x \col 1$ if $x$ is TRUE in the SAT solution.  Now,
+  consider any dependency $d=v -> S$.  From the construction above,
+  $S$ and $v$ correspond to a clause of the SAT instance.  At least
+  one literal in this clause must be assigned the value TRUE
+  (otherwise the clause is not satisfied); let $x$ be the
+  corresponding variable.  If the literal is positive, then (by
+  construction) $S$ contains $x \col 1$; since $x$ must be assigned
+  the value TRUE. $I \installs x \col 1$.  Hence, $I \satisfies d$.
+  On the other hand, if the literal is negative, then $S$ contains $x
+  \col 0$ and $I \installs x \col 0$, so $I \satisfies d$.  Thus, $I$
+  is a consistent installation.
 
   On the other hand, suppose that there is a consistent installation
   $I$.  For all variables $x$, let $p$ be the corresponding package;
-  if $I(p)=x:0$, assign FALSE to $x$, and if $I(p)=x:1$, assign TRUE
-  to $x$.  Now consider any clause in the SAT problem: from the
-  construction above, $\D$ contains a dependency $v_c -> S$ where
-  $v_c$ is the single version of the package corresponding to the
-  clause.  Since $I$ must map the clause's package to $v_c$ and $I$ is
-  consistent, there must be a version $v \in S$ such that
-  $I(\pkgof{v})=v$; i.e., $I$ installs $v$.  But from the
-  construction, $v^p$ corresponds to some variable $x$, and $v$ is
-  either $x:1$, where $x$ appears as a positive literal in the clause
-  or $x:0$, where $x$ appears as a negative literal in the clause.
-  Thus, the clause is satisfied, and so the assignment described above
-  satisfies all clauses.
+  if $I(p)=x \col 0$, assign FALSE to $x$, and if $I(p)=x \col 1$,
+  assign TRUE to $x$.  Now consider any clause in the SAT problem:
+  from the construction above, $\D$ contains a dependency $v_c -> S$
+  where $v_c$ is the single version of the package corresponding to
+  the clause.  Since we must have $I \installs v_c$ and since $I$ is
+  consistent, there must be a version $v' \in S$ such that $I
+  \installs v'$.  But from the construction, there is some $x$ such
+  that $v$ corresponds to either $x \col 1$, where $x$ appears as a
+  positive literal in the clause or $x \col 0$, where $x$ appears as a
+  negative literal in the clause.  Thus, the clause is satisfied, and
+  so the assignment described above satisfies all clauses.
 
   Therefore, dependency resolution is NP-complete.
 \end{proof}
@@ -439,27 +462,27 @@
 the problem can be declared either solvable or unsolvable on the basis
 of a quick analysis of that region of the graph.
 
-In fact, when even relatively basic search techniques are applied to a
-typical dependency problem, the difficulties that arise are related
-not to a paucity of solutions, but rather to an excess of them.  That
-is, finding \emph{a} solution is easy, but finding the \emph{right}
-solution is more problematic.  Indeed, in the Debian framework there
-is always at least one solution: removing every package on the system
-will satisfy all the dependencies.  However, for obvious reasons, this
-is not a solution that we want to produce!
+In fact, when even relatively basic search techniques are applied to
+many typical dependency problems, the difficulties that arise are
+related not to a paucity of solutions, but rather to an excess of
+them.  That is, finding \emph{a} solution is easy, but finding the
+\emph{right} solution is more problematic.  Indeed, in the Debian
+framework there is always at least one solution: removing every
+package on the system will satisfy all the dependencies.  However, for
+obvious reasons, this is not a solution that we want to produce!
 
 \subsection{Solving Dependencies Through Best-First Search}
 
-I propose the use of a relatively simple algorithm -- best-first
-search -- to resolve dependencies.  To briefly review, best-first
-search works by keeping a priority queue, known as the ``open'' queue,
-of potential (or partial) solutions.  The priority queue is sorted
-according to some heuristic function that quantifies the ``goodness''
-of each node (often in terms of nearness to a full solution).  In each
-iteration of the algorithm, the ``best'' partial solution is removed
-from the queue.  If it is a full solution, the algorithm terminates;
-otherwise, each ``successor'' node is generated and placed in the
-queue.
+This problem statement suggests the use of a relatively simple
+algorithm -- best-first search -- to resolve dependencies.  To briefly
+review, best-first search works by keeping a priority queue, known as
+the ``open'' queue, of potential (or partial) solutions.  The priority
+queue is sorted according to some heuristic function that quantifies
+the ``goodness'' of each node (often in terms of nearness to a full
+solution).  In each iteration of the algorithm, the ``best'' partial
+solution is removed from the queue.  If it is a full solution, the
+algorithm terminates; otherwise, each ``successor'' node is generated
+and placed in the queue.
 
 There are two main issues to resolve:
 
@@ -472,8 +495,8 @@
 to a single package.  However, this would result in a gigantic
 branching factor (over 1500 branches at each step in the current
 Debian archive), and it would cause the algorithm to consider
-adjusting packages that are utterly irrelevant to the problem at hand,
-as well as changing a package multiple times (which can lead to
+adjusting packages that were utterly irrelevant to the problem at
+hand, as well as changing a package multiple times (which can lead to
 choices being made for reasons that are obscure to the user).  A more
 focussed approach is needed.
 
@@ -482,7 +505,8 @@
 as to how dependencies should be resolved.  If \pkg{A} depends on
 \pkg{B}, \pkg{A} is installed, and \pkg{B} is not installed, it is
 usually better to install \pkg{B} than to remove \pkg{A}; however, a
-straight count of broken dependencies might prefer either outcome.
+straight count of broken dependencies would consider both solutions to
+be equally ``good''.
 
 \subsubsection{Generating Successors to a Partial Solution}
 
@@ -504,22 +528,25 @@
 generating solutions: a solution should never modify a package twice.
 
 \begin{definition}
-  If the original installation was $I_0$, then for any $I$, the
-  installation $I'=I;p \mapsto v$ is a successor of $I$ if $v \neq
-  I_0(\pkgof{v})$ and $I(\pkgof{v})=I_0(\pkgof{v})$.
+  If the original installation was $I_0$, then for any $I$ and any $d
+  \in \D$ such that $I \not \satisfies d$, the installation $I'=I;v$
+  is a successor of $I$ for $d$ if $v \neq I_0(\pkgof{v})$ and
+  $I(\pkgof{v})=I_0(\pkgof{v})$.
 \end{definition}
 
 One might wonder whether this approach risks overlooking solutions:
-for instance, maybe it is really necessary to ``go in circles'' in
-order to find a particular solution.  As shown below, if a solution
-cannot be generated by repeatedly using this limited set of
-successors, it is only because there is a ``simpler'' version of the
-solution (one that modifies the states of fewer packages) that is
+for instance, maybe it really is necessary to ``go in circles'' in
+order to find a particular solution.  However, as shown below, if a
+solution cannot be generated through the application of the successor
+rule defined above, then there is a ``simpler'' version of that
+solution (one which modifies the states of fewer packages) that can be
 generated.  To prove this, I first will introduce some definitions and
-notations:
+notation.
 
 \begin{definition}
-  Let $I_1$, $I_2$ be installations.
+  Let $I_1$, $I_2$ be installations.  The following notation is used
+  to denote the ``distance'' from $I_1$ to $I_2$ (defined as the
+  number of packages whose mappings differ between $I_1$ and $I_2$).
 
   \begin{equation}
     \idist{I_1}{I_2}=\len{\{p \st I_1(p) \neq I_2(p)\}}
@@ -532,45 +559,47 @@
   $I'(p)=I_1(p)$ or $I'(p)=I_2(p)$.
 \end{definition}
 
+\begin{note}
+  An alternative phrasing is that if $I'$ is a hybrid of $I_1$ and
+  $I_2$, then for all $v$ such that $I' \installs v$, either $I_1
+  \installs v$ or $I_2 \installs v$.
+\end{note}
+
 \begin{definition}
-  If $I'$ is a successor of $I$ with respect to $I_0$, then $I
-  \nsol{I_0} I'$.  If there exist $I_1, \dots, I_n$ such that $I_1
-  \nsol{I_0} I_2 \nsol{I_0} \dots \nsol{I_0} I_n$, then $I_1
+  If $I'$ is a successor of $I$ with respect to $I_0$ for the
+  dependency $d$, then $I \nsol{I_0}{d} I'$.  If there exist $I_1,
+  \dots, I_n$ and $d_1, \dots, d_n$ such that $I_1 \nsol{I_0}{d_1} I_2
+  \nsol{I_0}{d_2} \dots \nsol{I_0}{d_{n-1}} I_n$, then $I_1
   \nsolmany{I_0} I_n$.
 \end{definition}
 
 \begin{lemma}
   Let $I_c$ be any consistent installation (if one exists) and $I_0$
-  be any installation.  For all hybrids $I$ of $I_0$ and $I_c$, either
-  $I$ is consistent or there exists a successor $I'$ of $I$ such that
-  $I'$ is a hybrid of $I_0$ and $I_c$, and $\idist{I'}{I_c} <
-  \idist{I}{I_c}$.
+  be any installation.  For all hybrids $I$ of $I_0$ and $I_c$ and all
+  dependencies $d \in \D$ such that $I \not \satisfies d$, there
+  exists an $I'$ such that $I \nsol{I_0}{d} I'$, $I'$ is a hybrid of
+  $I_0$ and $I_c$, and $\idist{I'}{I_c} < \idist{I}{I_c}$.
 \end{lemma}
 
 \begin{proof}
   Consider any hybrid $I$ of $I_0$ and $I_c$ such that $I$ is not a
-  solution.  Since $I$ is inconsistent, there is a dependency $v->S$
-  such that $I(\pkgof{v})=v$ and for all $v' \in S$, $I(\pkgof{v'})
-  \neq v'$.  However, since $I_c$ is consistent, either
-  $I_c(\pkgof{v}) \neq v$ or there exists a $v' \in S$ such that
-  $I_c(\pkgof{v'})=v'$.
+  solution and any $d=v -> S \in \D$ such that $I \not \satisfies d$.
 
-  Suppose that $I_c(\pkgof{v}) \neq v$.  Since $I$ is a hybrid of
-  $I_0$ and $I_c$, $I_0(\pkgof{v})=v$.  Thus, $I \nsol{I_0} I'$, where
+  Suppose that $I_c \not \installs v$.  Since $I$ is a hybrid of $I_0$
+  and $I_c$, $I_0 \installs v$.  Thus, $I \nsol{I_0}{d} I'$, where
 
   \begin{equation}
-    I'=I;\pkgof{v} \mapsto I_c(\pkgof{v})
+    I'=I;I_c(\pkgof{v})
   \end{equation}
 
-  On the other hand, if $I_c(\pkgof{v'})=v'$ for some $v' \in S$, note
-  that as before, $I_0(\pkgof{v'}) \neq v'$.  Therefore, $I \nsol{I_0}
-  I'$, where
+  On the other hand, if $I_c \installs v'$ for some $v' \in S$, then
+  $I_0 \not \installs v'$.  Therefore, $I \nsol{I_0}{d} I'$, where
 
   \begin{equation}
-    I'=I;\pkgof{v'} \mapsto v'
+    I'=I;v'
   \end{equation}
 
-  In either case, clearly $I'$ is a hybrid of $I_0$ and $I_c$, and
+  In either case, clearly $I'$ is a hybrid of $I_0$ and $I_c$ and
   $\idist{I'}{I_c}<\idist{I}{I_c}$, proving the lemma.
 \end{proof}
 
@@ -583,10 +612,11 @@
 \end{theorem}
 
 \begin{proof}
-  Proof is by repeated application of the previous lemma.  For each
-  inconsistent hybrid $I$ of $I_0$ and $I_c$, let $I^{+}$ be the
-  hybrid shown to exist in the previous lemma, and define a sequence
-  $I_1, \dots$ as follows:
+  Proof is by repeated application of the previous lemma.  Consider
+  any inconsistent hybrid $I$ of $I_0$ and $I_c$.  Let $I^{+}$ be the
+  $I'$ shown to exist in the previous lemma for an arbitrary $d$ such
+  that $I \not \satisfies d$, and define a sequence $I_1, \dots$ as
+  follows:
 
   \begin{equation}
     I_k = \begin{cases}
@@ -618,57 +648,22 @@
 ``reward'' avenues of inquiry that decrease the number of unsatisfied
 dependencies.  This is not, of course, guaranteed to produce a
 solution quickly; however, in practice, it seems to be a sufficient
-hint for the algorithm to reach a goal node in under 100 steps.
+hint for the algorithm to reach a goal node in a reasonable number of
+steps\footnote{Most searches seem to converge in under 5000 steps.}.
 Finding ``good'' solutions is somewhat more difficult, not least
 because of the fact that ``good'' is an ill-defined property.  The
 experimental implementation of this algorithm in \pkg{aptitude} uses
-the following criteria to assign scores to nodes:
+the following general criteria to assign scores to nodes:
 
 \begin{itemize}
-\item Most importantly, each version of each package is assigned a
-  separate score.  The score of a potential solution is based in part
-  on the \emph{difference} between the total score of the package
-  versions it installs and the score of the versions installed by
-  $I_0$.\footnote{In principle, a solution's score is simply the total
-    score of the package versions it installs; $I_0$ is taken to be a
-    baseline for this component of the score because it does not
-    affect the ordering of solutions, and it is much more convenient
-    to calculate.}
-
-  These weights define the priorities of the search.  In the current
-  aptitude implementation, as of this writing, the scores of the
-  versions the package $p$ are assigned as follows\footnote{Very minor
-    tweaking is also performed to adjust the score of packages
-    according to their Debian-assigned ``priority'', but this affects
-    the score by no more than 5 points.}:
-
-  \begin{itemize}
-  \item If $I_0(p)$ was explicitly selected by the user, it is
-    assigned $40$ points.
-  \item If $I_0(p)$ was automatically selected, it is assigned $0$
-    points.  This has the effect of biasing the algorithm towards
-    modifying packages whose current states were not explicitly
-    requested by the user.
-  \item If $I_0$ does not remove $p$, its removal is assigned $-300$
-    points.  If $p$ is an ``essential'' package, such as the C library
-    or the \pkg{dpkg} package manager, its removal is assigned an
-    additional $-100000$ points.
-  \item If $I_0(p)$ is not the current version of the
-    package\footnote{The ``current version'' refers to the version
-      installed on the system; $I_0$ is simply a representation of the
-      package installation and removal commands issued by the user.},
-    then $p$'s current version -- which corresponds to cancelling a
-    user request or automatic decision -- is assigned $-50$ points.
-  \item Installing $p$ (i.e., the ``default'' version of $p$ if $p$ is
-    not currently installed) is assigned $-20$ points.
-  \item Upgrading $p$ (i.e., the ``default'' version of $p$ if $p$ is
-    currently installed at a different version\footnote{This is a
-      slight misnomer, as it includes downgrades to the default
-      version.}) is assigned $-15$ points.
-  \item Installing a version of $p$ other than the ``default'' version
-    (as usual, if $I_0$ does not install this version of $p$) is
-    assigned -40 points.
-  \end{itemize}
+\item Each version of each package is assigned a separate score.  By
+  default, removing any package is heavily penalized, altering
+  packages which were automatically installed recieves a smaller
+  penalty, maintaining the state of an automatic package makes no
+  contribution to the score, and maintaining the state of a manually
+  installed package receives a bonus.\footnote{In actuality, all that
+    is calculated is the difference between the initial total version
+    score and the final total version score.}
 
 \item A penalty is applied to each search node based on its distance
   from the root of the search.  This works to favor ``simpler''
@@ -676,11 +671,10 @@
 
 \item Nodes that resolve all dependencies are given an additional
   bonus -- usually a relatively minor one.  Goal nodes are moved
-  through the priority queue in the normal manner, rather than
-  immediately being returned, in order to ensure that ``better''
-  solutions are produced first -- and to make sure that solutions that
+  through the priority queue in the normal manner, rather than being
+  floated directly to its head, in order to ensure that solutions that
   are particularly ``bad'' are not produced unless it is absolutely
-  necessary.
+  necessary to do so.
 \end{itemize}
 
 Thus, letting $B(I)$ be the set of dependencies that are ``broken''
@@ -696,7 +690,9 @@
 if $i=j$ and $0$ otherwise).  In the current implementation,
 $\alpha_B=-100$, $\alpha_L=-10$, and $\alpha_G=50$.
 
-\section{Avoiding Exponential Blowup}
+\section{Reducing the Branching Factor}
+
+\subsection{One Dependency at a Time}
 
 The algorithm laid out above is sufficient to solve many of the
 dependency problems that are encountered in practice.  However, some
@@ -722,17 +718,17 @@
   Unfortunately, some packages have very many reverse dependencies.
   For instance, if $I$ removes the system C library, over a thousand
   dependencies will be unsatisfied -- and simply generating the
-  successors of this node will require time \emph{quadratic} in the
-  number of reverse dependencies of \pkg{libc}; i.e., over a million
-  dependencies will be tested.  This can impose a significant
-  performance penalty on the process of successor generation.
+  successors of this node will require time at least \emph{quadratic}
+  in the number of reverse dependencies of \pkg{libc}.  This can
+  impose a significant performance penalty on the process of successor
+  generation.
 
 \item Removing the bottom of a dependency chain.
 
   When an important library such as GTK is removed, it is necessary to
   propagate the removal ``up the dependency tree''.  However, the
   search technique outlined above will search exponentially many
-  installations before coming to this solution.  Aside from the goal
+  installations before settling on this solution.  Aside from the goal
   node of ``keep the library on the system'', the first step of the
   search will enqueue one node for each package depending on GTK; each
   node will remove its corresponding package.  As these nodes are
@@ -743,43 +739,135 @@
   packages) will be generated.
 \end{enumerate}
 
-The problem of ``too many reverse dependencies'' is relatively easily
-solved.  Instead of generating sucecssors for every dependency, it is
-sufficient to generate successors for a single dependency (this is
-evident from the proof of Theorem~\ref{thm:succ-sufficient}; a
-stronger condition will be demonstrated explicitly below).  In theory,
-this practice could lead to somewhat less optimal ordering of
-generated solutions, but the decrease in the problem's branching
-factor is well worth it.
-
-The second problem is somewhat trickier, but it can be resolved by
-restructuring the search space.  To each solution node $I$, attach a
-set $I_F$ of \emph{forbidden} versions; the successors of $I$ are
-restricted to those which do not install any version in $I_F$.  For
-all successors $I'$ of $I$, $I'_F \subseteq I_F$; furthermore, if a
-successor $I'$ of $I$ is generated by removing the source version of a
-dependency, then all of the targets of that dependency are members of
-$I'_F$.  This has the effect of forcing the algorithm to ``stick
+There is a simple solution to both of these problems.  Instead of
+generating successors for every dependency, it is sufficient to
+generate successors for a single, arbitrary dependency (as shown in
+Theorem~\ref{thm:succ-sufficient}).  In theory, this could lead to
+somewhat less optimal ordering of generated solutions, but this
+doesn't seem to be a major problem in practice and the decrease in the
+problem's branching factor is well worth it.
+
+\subsection{Exclude Supersets of Solutions}
+
+One simple way to trim the search tree is to drop any search node $I$
+that is a ``superset'' of a full solution $I_c$ -- meaning that $I_c$
+is a hybrid of $I$ and $I_0$.  This has the additional beneficial
+effect of preventing solutions from being offered to the user which
+are just a previously-displayed solution with some extra, redundant
+actions added to it.
+
+\subsection{Forbidden versions}
+
+If we have a choice between removing \pkg{p} and installing \pkg{q},
+and we choose to remove \pkg{p}, why should we ever install \pkg{q}?
+This question leads to yet another way of reducing the problem's
+branching factor.
+
+To each solution node $I$, attach a set $F$ of \emph{forbidden}
+versions; the successors of $I$ are restricted to those which do not
+install any version in $F$.  For all successors $I'$ of $I$, let $F'
+\subseteq F$; furthermore, if a successor $I'$ of $I$ is generated by
+removing the source version of a dependency, then all of the targets
+of that dependency are members of $I'_F$.  This new successor
+relationship is formally defined in Figure~\vref{fig:excludever}.
+
+\begin{figure}
+  \begin{gather*}
+    \inferrule{I \not \satisfies d \\ d = v -> S \\
+      I(v)=I_0(v) \\ \pkgof{v'}=\pkgof{v} \\ v' \not \in F}
+    {(I,F) \nsol{I_0}{d} (I;v',F \cup S)} \\
+    \inferrule{I \not \satisfies d \\ d = v -> S \\ v' \in S \\
+      I(v')=I_0(v') \\ v' \not \in F}
+    {(I,F) \nsol{I_0}{d} (I;v', F)}
+  \end{gather*}
+
+  \caption{Successor generation with forbidden versions}
+  \label{fig:excludever}
+\end{figure}
+
+This has the effect of forcing the algorithm to ``stick
 with'' a decision to forgo installing the targets of a dependency in
 favor of shifting the source.
 
-In combination with the tracking of forbidden versions, the algorithm
-is also modified to detect \emph{essential conflicts}: that is,
-dependencies which cannot be resolved due to the constraints placed on
-the generation of successors.  If $d$ is a dependency not satisfied by
-$I$, and if every package version which could resolve a dependency is
-illegal -- either because of the ``only modify it once'' rule or
-because it is a member of $I_F$ -- then $d$ is deemed to be an
-essential conflict and $I$ is discarded without further processing.
+\begin{note}
+  This technique could just as well be applied by expanding the
+  forbidden set when generating successors for the \emph{targets} of a
+  dependency: that is, forbidding a different version of the source of
+  a dependency to be installed.  The decision regarding which
+  exclusion principle to use was made on the basis of a conjecture
+  that we are more likely to encounter a ``hard'' dependency problem
+  when moving ``up'' a dependency chain than when moving ``down'' it.
+\end{note}
 
 Of course, it is important to verify that cutting off wide swathes of
 the search space in this manner does not impede our ability to
 generate solutions:
 
 \begin{theorem}
-  TODO.
+  Let $I_c$ be any consistent installation (if one exists) and $I_0$
+  be any installation.  There exists an $I_c'$ such that $I_c'$ is a
+  hybrid of $I_0$ and $I_0 \nsolmany{I_0} I_C'$.
 \end{theorem}
 
+\begin{proof}
+  Let $F_0=\emptyset$.  I claim that there exists a sequence
+  $(I_1,F_1),\dots$ such that for all $k \geq 0$,
+
+  \begin{itemize}
+  \item For all $v \in F_k$, $I_c \not \installs v$.
+  \item $I_0 \nsolmany{I_0} I_k$
+  \item Either $k=0$, $I_{k-1}$ is consistent and $I_k=I_{k-1}$, or
+    $\idist{I_k}{I_c}<\idist{I_{k-1}}{I_c}$.
+  \end{itemize}
+
+  Proof is by induction on $k$.  Suppose that a sequence
+  $(I_1,F_1),\dots,(I_k,F_k)$ exists satisfying the above condition.
+  If $I_k$ is consistent, then let $I_{k+1}=I_k$ and $F_{k+1}=F_k$;
+  the inductive hypothesis is satisfied immediately.
+
+  Otherwise, consider any $d=v -> S \in \D$ such that $I_k \not
+  \satisfies d$ (since $I_k$ is inconsistent, at least one such $d$
+  exists).  If there is a $v' \in S$ such that $I_c \installs v'$,
+  then let $I_{k+1}=I_k;v'$ and $F_{k+1}=F_k$.  Clearly $I_c \not
+  \installs v''$ for all $v'' \in F_{k+1}$ and
+  $\idist{I_{k+1}}{I_c}<\idist{I_k}{I_c}$; since we additionally have
+  $(I_k,F_k) \nsol{I_0}{d} (I_{k+1},F_k)$, the inductive hypothesis
+  holds.
+
+  If instead $I_c \not \installs v'$ for all $v' \in S$, then since
+  $I_c$ is consistent, $I_c(\pkgof{v}) \neq v$.  Let
+  $I_{k+1}=I_k;I_c(\pkgof{v})$ and $F_{k+1}=F_k \cup S$. $I_c \not
+  \installs v''$ for all $v'' \in S$ by definition and clearly
+  $\idist{I_{k+1}}{I_c}<\idist{I_k}{I_c}$.  In addition, $I_k
+  \nsol{I_0}{d} I_{k+1}$ by Figure~\ref{fig:excludever}; therefore,
+  the inductive hypothesis holds.
+
+  Thus, the claim is established: such a sequence exists.  Following
+  the logic of Theorem~\ref{thm:succ-sufficient}, we can see that for
+  $n=\idist{I_0}{I_c}$, $I_n$ is a consistent installation.
+  Furthermore, from the construction above, $I_n$ is a hybrid of $I_0$
+  and $I_c$.  Thus, the theorem is established with $I_c'=I_n$.
+\end{proof}
+
+\subsection{Use Logical Necessity}
+
+In combination with the tracking of forbidden versions, it is also
+possible to detect \emph{forced installations} and \emph{essential
+  conflicts}.  A forced installation is one which is logically
+necessary given $I$ and $F$: for instance, if we have $d=v -> \{v'_1,
+v'_2\}$, $I$ has touched $v$ (i.e., $I(v) \neq I_0$), $v'_1 \in F$,
+and $v'_2 \notin I_F$, then the only permissible successor given $d$
+is $(I;v'_1)$.  An essential conflict is a dependency for which
+\emph{no} successors can be generated: for instance, if in the
+previous example we instead had $v'_2 \in F$, then $d$ would be an
+essential conflict.
+
+If any essential conflicts exist in an installation $I$, it is
+discarded immediately (rather than, for instance, generating
+successors for all the solvable dependencies).  If any forced
+installations exist, they are accumulated and a successor formed by
+adding these installations to $I$ is placed into the open queue.
+
 \section{Implementation}
 
 A prototype implementation exists in the experimental branch of