[Aptitude-svn-commit] r3260 - in branches/aptitude-0.3/aptitude: . src/generic/problemresolver
Daniel Burrows
dburrows@costa.debian.org
Wed, 04 May 2005 17:51:18 +0000
Author: dburrows
Date: Wed May 4 17:51:15 2005
New Revision: 3260
Modified:
branches/aptitude-0.3/aptitude/ChangeLog
branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex
Log:
Significant edits, minor addition.
Modified: branches/aptitude-0.3/aptitude/ChangeLog
==============================================================================
--- branches/aptitude-0.3/aptitude/ChangeLog (original)
+++ branches/aptitude-0.3/aptitude/ChangeLog Wed May 4 17:51:15 2005
@@ -1,3 +1,9 @@
+2005-05-04 Daniel Burrows <dburrows@debian.org>
+
+ * src/generic/problemresolver/model.tex:
+
+ Edit a bunch of stuff in the document, and write up that second proof.
+
2005-05-01 Daniel Burrows <dburrows@debian.org>
* doc/en/aptitude.xml:
Modified: branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex
==============================================================================
--- branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex (original)
+++ branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex Wed May 4 17:51:15 2005
@@ -5,7 +5,10 @@
\usepackage{pgf}
\usepackage{amsmath}
\usepackage{amsthm}
+\usepackage{amssymb}
\usepackage{semantic}
+\usepackage{mathpartir}
+\usepackage{varioref}
\pagestyle{fancy}
@@ -39,9 +42,13 @@
\newcommand{\idist}[2]{\langle#1,#2\rangle}
\newcommand{\len}[1]{\lvert#1\rvert}
-\newcommand{\nsol}[1]{\overset{#1}{\Rightarrow}}
+\newcommand{\nsol}[2]{\overset{#1}{\underset{#2}{\Rightarrow}}}
\newcommand{\nsolmany}[1]{\overset{#1}{\underset{*}{\Rightarrow}}}
+\newcommand{\installs}{\vartriangleright}
+\newcommand{\satisfies}{\vdash}
+\newcommand{\col}{\mathpunct{:}}
+
\mathlig{->}{\rightarrow}
\mathlig{=>*}{\overset{*}{=>}}
@@ -212,13 +219,14 @@
to find solutions to dependency problems, and to cause algorithms that
manipulate dependencies to become horribly messy. In situations like
this, it is often a good idea to find a simpler, more mathematical
-model of the problem to be analyzed. Of course, it is well-known that
+model of the problem being analyzed. Of course, it is well-known that
package dependencies can be reduced to the satisfaction of Boolean
equations, but such a reduction is arguably \emph{too} extreme: it
certainly results in a mathematical model, but the model it produces
-hides the structure of the original problem. I will instead propose a
-model which is sufficient to describe any dependency problem (at least
-in Debian) and retains the structure of a package system.
+hides the structure of the original problem. The following section
+describes an alternate model which is sufficient to capture any
+dependency problem (at least in Debian) and retains the structure of a
+package system.
\subsection{Basic Concepts}
@@ -231,7 +239,7 @@
$\pkgof{v}$.
To represent the state of the entire system, the following sets are
-available:
+defined:
\begin{itemize}
\item $\P$ is the set of all packages.
@@ -241,6 +249,9 @@
asserted in the current package system.}
\end{itemize}
+Throughout this paper, I will assume that $\P$ and $\V$ (and hence
+$\D$) are finite.
+
\subsection{Reduction of Debian Dependencies to the Model}
As claimed above, it is possible to reduce a Debian dependency system
@@ -253,7 +264,7 @@
additional version for each package. This version represents the
state of the package when it is not installed. Versions
corresponding to versions of the Debian package are indicated by $p
- : n$ where $n$ is the version number, while the ``uninstalled''
+ \col n$ where $n$ is the version number, while the ``uninstalled''
version is indicated by $p_u$.
\item For each dependency of the version $v$ of a package on
$\pkg{A}_1\ |\ \dots$, accumulate a set $S$ containing all the
@@ -265,7 +276,7 @@
versions $3.1$, $3.2$, and $3.3$ of package \pkg{A} are available,
versions $1$, $2$, and $3$ of package \pkg{B} are available, and
package \pkg{C} version $3.14$ provides \pkg{B}, then
- $S=\{\pkg{A}:3.2,\pkg{A}:3.3,\pkg{B}:1,\pkg{B}:2,B:3,\pkg{C}:3.14\}$.
+ $S=\{\pkg{A} \col 3.2,\pkg{A} \col 3.3,\pkg{B} \col 1,\pkg{B} \col 2,B \col 3,\pkg{C} \col 3.14\}$.
$\D$ contains $v -> S$ for every such dependency.
\item For each conflict declared by a version $v$ on the package $p$,
@@ -279,7 +290,7 @@
For instance, if $v$ conflicts with \pkg{A}, of which versions 3.2
and 3.3 are available, versions 2 and 3 of \pkg{B} provide \pkg{A},
and no other versions of \pkg{B} are available, then
- $S=\{\pkg{A}:3.2,\pkg{A}:3.3,\pkg{A}:\pkg{UNINST},\pkg{B}:2,\pkg{B}:\pkg{UNINST}\}$.
+ $S=\{\pkg{A} \col 3.2,\pkg{A} \col 3.3,\pkg{A} \col \pkg{UNINST},\pkg{B} \col 2,\pkg{B} \col \pkg{UNINST}\}$.
\begin{note}
In reality, extra care must be taken to screen out self-conflicts
@@ -311,10 +322,13 @@
\end{definition}
\begin{definition}
- An installation $I$ is \emph{consistent} if it satisfies each
- dependency in the world: that is, for all $v -> \{v_1',\dots\}$ in
- $\D$, either $I \not \installs v$ or $I \installs v_i'$ for some
- $i$.
+ An installation $I$ satisfies a dependency $d=v -> S$ if either $I
+ \not \installs v$ or $I \installs v'$ for some $v' \in S$.
+\end{definition}
+
+\begin{definition}
+ An installation $I$ is \emph{consistent} if $I \satisfies d$ for all
+ $d \in \D$.
\end{definition}
\begin{definition}
@@ -328,6 +342,13 @@
v, & p' = p
\end{cases}
\end{equation}
+
+ As a shorthand, the following notation indicates that a particular
+ version of a package is to be installed:
+
+ \begin{equation}
+ I;v = I;\pkgof(v) \mapsto v
+ \end{equation}
\end{definition}
\section{The Dependency Resolution Problem}
@@ -336,11 +357,11 @@
Let $I_0$ be an inconsistent installation. We would like to find a
consistent installation that is ``similar to'' $I_0$. This is the
-dependency resolution problem. In a real package system, it
+dependency resolution problem. In a real package manager, it
corresponds to a situation in which the user's requests have resulted
in an invalid state (with unsatisfied dependencies or conflicts); the
-goal of a package manager in this situation is to find a ``small'' and
-``good'' set of changes that result in a valid state.
+goal of the package manager in this situation is to find a ``small''
+and ``good'' set of changes that result in a valid state.
\begin{note}
This problem is poorly defined: ``small'' and ``good'' are not
@@ -377,43 +398,45 @@
Create one package for each variable and for each clause in the SAT
problem. For each variable $x$, let the versions of the
- corresponding package be $x:0$ and $x:1$; for each clause, create
- exactly one version. For each SAT clause let $v_c$ be the package
- corresponding to the clause, and insert $v_c -> S$ into $\D$, where
- for each literal of a variable $x$ appearing in the clause, $S$
- contains $x:0$ if $x$ is a negative literal and $x:1$ if $x$ is a
- positive literal. This reduction is clearly polynomial-time; I
- claim that a solution to this set of dependencies exists if and only
- if a solution to the corresponding SAT problem exists.
+ corresponding package be $x \col 0$ and $x \col 1$; for each clause,
+ create exactly one version. For each SAT clause let $v_c$ be the
+ package corresponding to the clause, and insert $v_c -> S$ into
+ $\D$, where for each literal of a variable $x$ appearing in the
+ clause, $S$ contains $x \col 0$ if $x$ is a negative literal and $x
+ \col 1$ if $x$ is a positive literal. This reduction is clearly
+ polynomial-time; I claim that a solution to this set of dependencies
+ exists if and only if a solution to the corresponding SAT problem
+ exists.
Suppose that there is an assignment that solves the SAT problem.
Define an installation $I$ as follows: if $p$ corresponds to a
clause, $I(p)$ is the single version of $p$; if $p$ corresponds to a
- variable $x$, $I(p)=x:0$ if $x$ is FALSE in the SAT solution and
- $x:1$ if $x$ is TRUE in the SAT solution. Now, consider any
- dependency $v -> S$. From the construction above, $S$ corresponds
- to a clause of the SAT instance. At least one literal in this
- clause must be assigned the value TRUE (otherwise the clause is not
- satisfied); let $x$ be the corresponding variable. If the literal
- is positive, then (by construction) $S$ contains $x:1$ and (since
- $x$ must be assigned the value TRUE) $I(p)=x:1$. On the other hand,
- if the literal is negative, then $S$ contains $x:0$ and $I(p)=x:0$.
- Thus, $I$ is a consistent installation.
+ variable $x$, $I(p)=x \col 0$ if $x$ is FALSE in the SAT solution
+ and $I(p)=x \col 1$ if $x$ is TRUE in the SAT solution. Now,
+ consider any dependency $d=v -> S$. From the construction above,
+ $S$ and $v$ correspond to a clause of the SAT instance. At least
+ one literal in this clause must be assigned the value TRUE
+ (otherwise the clause is not satisfied); let $x$ be the
+ corresponding variable. If the literal is positive, then (by
+ construction) $S$ contains $x \col 1$; since $x$ must be assigned
+ the value TRUE. $I \installs x \col 1$. Hence, $I \satisfies d$.
+ On the other hand, if the literal is negative, then $S$ contains $x
+ \col 0$ and $I \installs x \col 0$, so $I \satisfies d$. Thus, $I$
+ is a consistent installation.
On the other hand, suppose that there is a consistent installation
$I$. For all variables $x$, let $p$ be the corresponding package;
- if $I(p)=x:0$, assign FALSE to $x$, and if $I(p)=x:1$, assign TRUE
- to $x$. Now consider any clause in the SAT problem: from the
- construction above, $\D$ contains a dependency $v_c -> S$ where
- $v_c$ is the single version of the package corresponding to the
- clause. Since $I$ must map the clause's package to $v_c$ and $I$ is
- consistent, there must be a version $v \in S$ such that
- $I(\pkgof{v})=v$; i.e., $I$ installs $v$. But from the
- construction, $v^p$ corresponds to some variable $x$, and $v$ is
- either $x:1$, where $x$ appears as a positive literal in the clause
- or $x:0$, where $x$ appears as a negative literal in the clause.
- Thus, the clause is satisfied, and so the assignment described above
- satisfies all clauses.
+ if $I(p)=x \col 0$, assign FALSE to $x$, and if $I(p)=x \col 1$,
+ assign TRUE to $x$. Now consider any clause in the SAT problem:
+ from the construction above, $\D$ contains a dependency $v_c -> S$
+ where $v_c$ is the single version of the package corresponding to
+ the clause. Since we must have $I \installs v_c$ and since $I$ is
+ consistent, there must be a version $v' \in S$ such that $I
+ \installs v'$. But from the construction, there is some $x$ such
+ that $v$ corresponds to either $x \col 1$, where $x$ appears as a
+ positive literal in the clause or $x \col 0$, where $x$ appears as a
+ negative literal in the clause. Thus, the clause is satisfied, and
+ so the assignment described above satisfies all clauses.
Therefore, dependency resolution is NP-complete.
\end{proof}
@@ -439,27 +462,27 @@
the problem can be declared either solvable or unsolvable on the basis
of a quick analysis of that region of the graph.
-In fact, when even relatively basic search techniques are applied to a
-typical dependency problem, the difficulties that arise are related
-not to a paucity of solutions, but rather to an excess of them. That
-is, finding \emph{a} solution is easy, but finding the \emph{right}
-solution is more problematic. Indeed, in the Debian framework there
-is always at least one solution: removing every package on the system
-will satisfy all the dependencies. However, for obvious reasons, this
-is not a solution that we want to produce!
+In fact, when even relatively basic search techniques are applied to
+many typical dependency problems, the difficulties that arise are
+related not to a paucity of solutions, but rather to an excess of
+them. That is, finding \emph{a} solution is easy, but finding the
+\emph{right} solution is more problematic. Indeed, in the Debian
+framework there is always at least one solution: removing every
+package on the system will satisfy all the dependencies. However, for
+obvious reasons, this is not a solution that we want to produce!
\subsection{Solving Dependencies Through Best-First Search}
-I propose the use of a relatively simple algorithm -- best-first
-search -- to resolve dependencies. To briefly review, best-first
-search works by keeping a priority queue, known as the ``open'' queue,
-of potential (or partial) solutions. The priority queue is sorted
-according to some heuristic function that quantifies the ``goodness''
-of each node (often in terms of nearness to a full solution). In each
-iteration of the algorithm, the ``best'' partial solution is removed
-from the queue. If it is a full solution, the algorithm terminates;
-otherwise, each ``successor'' node is generated and placed in the
-queue.
+This problem statement suggests the use of a relatively simple
+algorithm -- best-first search -- to resolve dependencies. To briefly
+review, best-first search works by keeping a priority queue, known as
+the ``open'' queue, of potential (or partial) solutions. The priority
+queue is sorted according to some heuristic function that quantifies
+the ``goodness'' of each node (often in terms of nearness to a full
+solution). In each iteration of the algorithm, the ``best'' partial
+solution is removed from the queue. If it is a full solution, the
+algorithm terminates; otherwise, each ``successor'' node is generated
+and placed in the queue.
There are two main issues to resolve:
@@ -472,8 +495,8 @@
to a single package. However, this would result in a gigantic
branching factor (over 1500 branches at each step in the current
Debian archive), and it would cause the algorithm to consider
-adjusting packages that are utterly irrelevant to the problem at hand,
-as well as changing a package multiple times (which can lead to
+adjusting packages that were utterly irrelevant to the problem at
+hand, as well as changing a package multiple times (which can lead to
choices being made for reasons that are obscure to the user). A more
focussed approach is needed.
@@ -482,7 +505,8 @@
as to how dependencies should be resolved. If \pkg{A} depends on
\pkg{B}, \pkg{A} is installed, and \pkg{B} is not installed, it is
usually better to install \pkg{B} than to remove \pkg{A}; however, a
-straight count of broken dependencies might prefer either outcome.
+straight count of broken dependencies would consider both solutions to
+be equally ``good''.
\subsubsection{Generating Successors to a Partial Solution}
@@ -504,22 +528,25 @@
generating solutions: a solution should never modify a package twice.
\begin{definition}
- If the original installation was $I_0$, then for any $I$, the
- installation $I'=I;p \mapsto v$ is a successor of $I$ if $v \neq
- I_0(\pkgof{v})$ and $I(\pkgof{v})=I_0(\pkgof{v})$.
+ If the original installation was $I_0$, then for any $I$ and any $d
+ \in \D$ such that $I \not \satisfies d$, the installation $I'=I;v$
+ is a successor of $I$ for $d$ if $v \neq I_0(\pkgof{v})$ and
+ $I(\pkgof{v})=I_0(\pkgof{v})$.
\end{definition}
One might wonder whether this approach risks overlooking solutions:
-for instance, maybe it is really necessary to ``go in circles'' in
-order to find a particular solution. As shown below, if a solution
-cannot be generated by repeatedly using this limited set of
-successors, it is only because there is a ``simpler'' version of the
-solution (one that modifies the states of fewer packages) that is
+for instance, maybe it really is necessary to ``go in circles'' in
+order to find a particular solution. However, as shown below, if a
+solution cannot be generated through the application of the successor
+rule defined above, then there is a ``simpler'' version of that
+solution (one which modifies the states of fewer packages) that can be
generated. To prove this, I first will introduce some definitions and
-notations:
+notation.
\begin{definition}
- Let $I_1$, $I_2$ be installations.
+ Let $I_1$, $I_2$ be installations. The following notation is used
+ to denote the ``distance'' from $I_1$ to $I_2$ (defined as the
+ number of packages whose mappings differ between $I_1$ and $I_2$).
\begin{equation}
\idist{I_1}{I_2}=\len{\{p \st I_1(p) \neq I_2(p)\}}
@@ -532,45 +559,47 @@
$I'(p)=I_1(p)$ or $I'(p)=I_2(p)$.
\end{definition}
+\begin{note}
+ An alternative phrasing is that if $I'$ is a hybrid of $I_1$ and
+ $I_2$, then for all $v$ such that $I' \installs v$, either $I_1
+ \installs v$ or $I_2 \installs v$.
+\end{note}
+
\begin{definition}
- If $I'$ is a successor of $I$ with respect to $I_0$, then $I
- \nsol{I_0} I'$. If there exist $I_1, \dots, I_n$ such that $I_1
- \nsol{I_0} I_2 \nsol{I_0} \dots \nsol{I_0} I_n$, then $I_1
+ If $I'$ is a successor of $I$ with respect to $I_0$ for the
+ dependency $d$, then $I \nsol{I_0}{d} I'$. If there exist $I_1,
+ \dots, I_n$ and $d_1, \dots, d_n$ such that $I_1 \nsol{I_0}{d_1} I_2
+ \nsol{I_0}{d_2} \dots \nsol{I_0}{d_{n-1}} I_n$, then $I_1
\nsolmany{I_0} I_n$.
\end{definition}
\begin{lemma}
Let $I_c$ be any consistent installation (if one exists) and $I_0$
- be any installation. For all hybrids $I$ of $I_0$ and $I_c$, either
- $I$ is consistent or there exists a successor $I'$ of $I$ such that
- $I'$ is a hybrid of $I_0$ and $I_c$, and $\idist{I'}{I_c} <
- \idist{I}{I_c}$.
+ be any installation. For all hybrids $I$ of $I_0$ and $I_c$ and all
+ dependencies $d \in \D$ such that $I \not \satisfies d$, there
+ exists an $I'$ such that $I \nsol{I_0}{d} I'$, $I'$ is a hybrid of
+ $I_0$ and $I_c$, and $\idist{I'}{I_c} < \idist{I}{I_c}$.
\end{lemma}
\begin{proof}
Consider any hybrid $I$ of $I_0$ and $I_c$ such that $I$ is not a
- solution. Since $I$ is inconsistent, there is a dependency $v->S$
- such that $I(\pkgof{v})=v$ and for all $v' \in S$, $I(\pkgof{v'})
- \neq v'$. However, since $I_c$ is consistent, either
- $I_c(\pkgof{v}) \neq v$ or there exists a $v' \in S$ such that
- $I_c(\pkgof{v'})=v'$.
+ solution and any $d=v -> S \in \D$ such that $I \not \satisfies d$.
- Suppose that $I_c(\pkgof{v}) \neq v$. Since $I$ is a hybrid of
- $I_0$ and $I_c$, $I_0(\pkgof{v})=v$. Thus, $I \nsol{I_0} I'$, where
+ Suppose that $I_c \not \installs v$. Since $I$ is a hybrid of $I_0$
+ and $I_c$, $I_0 \installs v$. Thus, $I \nsol{I_0}{d} I'$, where
\begin{equation}
- I'=I;\pkgof{v} \mapsto I_c(\pkgof{v})
+ I'=I;I_c(\pkgof{v})
\end{equation}
- On the other hand, if $I_c(\pkgof{v'})=v'$ for some $v' \in S$, note
- that as before, $I_0(\pkgof{v'}) \neq v'$. Therefore, $I \nsol{I_0}
- I'$, where
+ On the other hand, if $I_c \installs v'$ for some $v' \in S$, then
+ $I_0 \not \installs v'$. Therefore, $I \nsol{I_0}{d} I'$, where
\begin{equation}
- I'=I;\pkgof{v'} \mapsto v'
+ I'=I;v'
\end{equation}
- In either case, clearly $I'$ is a hybrid of $I_0$ and $I_c$, and
+ In either case, clearly $I'$ is a hybrid of $I_0$ and $I_c$ and
$\idist{I'}{I_c}<\idist{I}{I_c}$, proving the lemma.
\end{proof}
@@ -583,10 +612,11 @@
\end{theorem}
\begin{proof}
- Proof is by repeated application of the previous lemma. For each
- inconsistent hybrid $I$ of $I_0$ and $I_c$, let $I^{+}$ be the
- hybrid shown to exist in the previous lemma, and define a sequence
- $I_1, \dots$ as follows:
+ Proof is by repeated application of the previous lemma. Consider
+ any inconsistent hybrid $I$ of $I_0$ and $I_c$. Let $I^{+}$ be the
+ $I'$ shown to exist in the previous lemma for an arbitrary $d$ such
+ that $I \not \satisfies d$, and define a sequence $I_1, \dots$ as
+ follows:
\begin{equation}
I_k = \begin{cases}
@@ -618,57 +648,22 @@
``reward'' avenues of inquiry that decrease the number of unsatisfied
dependencies. This is not, of course, guaranteed to produce a
solution quickly; however, in practice, it seems to be a sufficient
-hint for the algorithm to reach a goal node in under 100 steps.
+hint for the algorithm to reach a goal node in a reasonable number of
+steps\footnote{Most searches seem to converge in under 5000 steps.}.
Finding ``good'' solutions is somewhat more difficult, not least
because of the fact that ``good'' is an ill-defined property. The
experimental implementation of this algorithm in \pkg{aptitude} uses
-the following criteria to assign scores to nodes:
+the following general criteria to assign scores to nodes:
\begin{itemize}
-\item Most importantly, each version of each package is assigned a
- separate score. The score of a potential solution is based in part
- on the \emph{difference} between the total score of the package
- versions it installs and the score of the versions installed by
- $I_0$.\footnote{In principle, a solution's score is simply the total
- score of the package versions it installs; $I_0$ is taken to be a
- baseline for this component of the score because it does not
- affect the ordering of solutions, and it is much more convenient
- to calculate.}
-
- These weights define the priorities of the search. In the current
- aptitude implementation, as of this writing, the scores of the
- versions the package $p$ are assigned as follows\footnote{Very minor
- tweaking is also performed to adjust the score of packages
- according to their Debian-assigned ``priority'', but this affects
- the score by no more than 5 points.}:
-
- \begin{itemize}
- \item If $I_0(p)$ was explicitly selected by the user, it is
- assigned $40$ points.
- \item If $I_0(p)$ was automatically selected, it is assigned $0$
- points. This has the effect of biasing the algorithm towards
- modifying packages whose current states were not explicitly
- requested by the user.
- \item If $I_0$ does not remove $p$, its removal is assigned $-300$
- points. If $p$ is an ``essential'' package, such as the C library
- or the \pkg{dpkg} package manager, its removal is assigned an
- additional $-100000$ points.
- \item If $I_0(p)$ is not the current version of the
- package\footnote{The ``current version'' refers to the version
- installed on the system; $I_0$ is simply a representation of the
- package installation and removal commands issued by the user.},
- then $p$'s current version -- which corresponds to cancelling a
- user request or automatic decision -- is assigned $-50$ points.
- \item Installing $p$ (i.e., the ``default'' version of $p$ if $p$ is
- not currently installed) is assigned $-20$ points.
- \item Upgrading $p$ (i.e., the ``default'' version of $p$ if $p$ is
- currently installed at a different version\footnote{This is a
- slight misnomer, as it includes downgrades to the default
- version.}) is assigned $-15$ points.
- \item Installing a version of $p$ other than the ``default'' version
- (as usual, if $I_0$ does not install this version of $p$) is
- assigned -40 points.
- \end{itemize}
+\item Each version of each package is assigned a separate score. By
+ default, removing any package is heavily penalized, altering
+ packages which were automatically installed recieves a smaller
+ penalty, maintaining the state of an automatic package makes no
+ contribution to the score, and maintaining the state of a manually
+ installed package receives a bonus.\footnote{In actuality, all that
+ is calculated is the difference between the initial total version
+ score and the final total version score.}
\item A penalty is applied to each search node based on its distance
from the root of the search. This works to favor ``simpler''
@@ -676,11 +671,10 @@
\item Nodes that resolve all dependencies are given an additional
bonus -- usually a relatively minor one. Goal nodes are moved
- through the priority queue in the normal manner, rather than
- immediately being returned, in order to ensure that ``better''
- solutions are produced first -- and to make sure that solutions that
+ through the priority queue in the normal manner, rather than being
+ floated directly to its head, in order to ensure that solutions that
are particularly ``bad'' are not produced unless it is absolutely
- necessary.
+ necessary to do so.
\end{itemize}
Thus, letting $B(I)$ be the set of dependencies that are ``broken''
@@ -696,7 +690,9 @@
if $i=j$ and $0$ otherwise). In the current implementation,
$\alpha_B=-100$, $\alpha_L=-10$, and $\alpha_G=50$.
-\section{Avoiding Exponential Blowup}
+\section{Reducing the Branching Factor}
+
+\subsection{One Dependency at a Time}
The algorithm laid out above is sufficient to solve many of the
dependency problems that are encountered in practice. However, some
@@ -722,17 +718,17 @@
Unfortunately, some packages have very many reverse dependencies.
For instance, if $I$ removes the system C library, over a thousand
dependencies will be unsatisfied -- and simply generating the
- successors of this node will require time \emph{quadratic} in the
- number of reverse dependencies of \pkg{libc}; i.e., over a million
- dependencies will be tested. This can impose a significant
- performance penalty on the process of successor generation.
+ successors of this node will require time at least \emph{quadratic}
+ in the number of reverse dependencies of \pkg{libc}. This can
+ impose a significant performance penalty on the process of successor
+ generation.
\item Removing the bottom of a dependency chain.
When an important library such as GTK is removed, it is necessary to
propagate the removal ``up the dependency tree''. However, the
search technique outlined above will search exponentially many
- installations before coming to this solution. Aside from the goal
+ installations before settling on this solution. Aside from the goal
node of ``keep the library on the system'', the first step of the
search will enqueue one node for each package depending on GTK; each
node will remove its corresponding package. As these nodes are
@@ -743,43 +739,135 @@
packages) will be generated.
\end{enumerate}
-The problem of ``too many reverse dependencies'' is relatively easily
-solved. Instead of generating sucecssors for every dependency, it is
-sufficient to generate successors for a single dependency (this is
-evident from the proof of Theorem~\ref{thm:succ-sufficient}; a
-stronger condition will be demonstrated explicitly below). In theory,
-this practice could lead to somewhat less optimal ordering of
-generated solutions, but the decrease in the problem's branching
-factor is well worth it.
-
-The second problem is somewhat trickier, but it can be resolved by
-restructuring the search space. To each solution node $I$, attach a
-set $I_F$ of \emph{forbidden} versions; the successors of $I$ are
-restricted to those which do not install any version in $I_F$. For
-all successors $I'$ of $I$, $I'_F \subseteq I_F$; furthermore, if a
-successor $I'$ of $I$ is generated by removing the source version of a
-dependency, then all of the targets of that dependency are members of
-$I'_F$. This has the effect of forcing the algorithm to ``stick
+There is a simple solution to both of these problems. Instead of
+generating successors for every dependency, it is sufficient to
+generate successors for a single, arbitrary dependency (as shown in
+Theorem~\ref{thm:succ-sufficient}). In theory, this could lead to
+somewhat less optimal ordering of generated solutions, but this
+doesn't seem to be a major problem in practice and the decrease in the
+problem's branching factor is well worth it.
+
+\subsection{Exclude Supersets of Solutions}
+
+One simple way to trim the search tree is to drop any search node $I$
+that is a ``superset'' of a full solution $I_c$ -- meaning that $I_c$
+is a hybrid of $I$ and $I_0$. This has the additional beneficial
+effect of preventing solutions from being offered to the user which
+are just a previously-displayed solution with some extra, redundant
+actions added to it.
+
+\subsection{Forbidden versions}
+
+If we have a choice between removing \pkg{p} and installing \pkg{q},
+and we choose to remove \pkg{p}, why should we ever install \pkg{q}?
+This question leads to yet another way of reducing the problem's
+branching factor.
+
+To each solution node $I$, attach a set $F$ of \emph{forbidden}
+versions; the successors of $I$ are restricted to those which do not
+install any version in $F$. For all successors $I'$ of $I$, let $F'
+\subseteq F$; furthermore, if a successor $I'$ of $I$ is generated by
+removing the source version of a dependency, then all of the targets
+of that dependency are members of $I'_F$. This new successor
+relationship is formally defined in Figure~\vref{fig:excludever}.
+
+\begin{figure}
+ \begin{gather*}
+ \inferrule{I \not \satisfies d \\ d = v -> S \\
+ I(v)=I_0(v) \\ \pkgof{v'}=\pkgof{v} \\ v' \not \in F}
+ {(I,F) \nsol{I_0}{d} (I;v',F \cup S)} \\
+ \inferrule{I \not \satisfies d \\ d = v -> S \\ v' \in S \\
+ I(v')=I_0(v') \\ v' \not \in F}
+ {(I,F) \nsol{I_0}{d} (I;v', F)}
+ \end{gather*}
+
+ \caption{Successor generation with forbidden versions}
+ \label{fig:excludever}
+\end{figure}
+
+This has the effect of forcing the algorithm to ``stick
with'' a decision to forgo installing the targets of a dependency in
favor of shifting the source.
-In combination with the tracking of forbidden versions, the algorithm
-is also modified to detect \emph{essential conflicts}: that is,
-dependencies which cannot be resolved due to the constraints placed on
-the generation of successors. If $d$ is a dependency not satisfied by
-$I$, and if every package version which could resolve a dependency is
-illegal -- either because of the ``only modify it once'' rule or
-because it is a member of $I_F$ -- then $d$ is deemed to be an
-essential conflict and $I$ is discarded without further processing.
+\begin{note}
+ This technique could just as well be applied by expanding the
+ forbidden set when generating successors for the \emph{targets} of a
+ dependency: that is, forbidding a different version of the source of
+ a dependency to be installed. The decision regarding which
+ exclusion principle to use was made on the basis of a conjecture
+ that we are more likely to encounter a ``hard'' dependency problem
+ when moving ``up'' a dependency chain than when moving ``down'' it.
+\end{note}
Of course, it is important to verify that cutting off wide swathes of
the search space in this manner does not impede our ability to
generate solutions:
\begin{theorem}
- TODO.
+ Let $I_c$ be any consistent installation (if one exists) and $I_0$
+ be any installation. There exists an $I_c'$ such that $I_c'$ is a
+ hybrid of $I_0$ and $I_0 \nsolmany{I_0} I_C'$.
\end{theorem}
+\begin{proof}
+ Let $F_0=\emptyset$. I claim that there exists a sequence
+ $(I_1,F_1),\dots$ such that for all $k \geq 0$,
+
+ \begin{itemize}
+ \item For all $v \in F_k$, $I_c \not \installs v$.
+ \item $I_0 \nsolmany{I_0} I_k$
+ \item Either $k=0$, $I_{k-1}$ is consistent and $I_k=I_{k-1}$, or
+ $\idist{I_k}{I_c}<\idist{I_{k-1}}{I_c}$.
+ \end{itemize}
+
+ Proof is by induction on $k$. Suppose that a sequence
+ $(I_1,F_1),\dots,(I_k,F_k)$ exists satisfying the above condition.
+ If $I_k$ is consistent, then let $I_{k+1}=I_k$ and $F_{k+1}=F_k$;
+ the inductive hypothesis is satisfied immediately.
+
+ Otherwise, consider any $d=v -> S \in \D$ such that $I_k \not
+ \satisfies d$ (since $I_k$ is inconsistent, at least one such $d$
+ exists). If there is a $v' \in S$ such that $I_c \installs v'$,
+ then let $I_{k+1}=I_k;v'$ and $F_{k+1}=F_k$. Clearly $I_c \not
+ \installs v''$ for all $v'' \in F_{k+1}$ and
+ $\idist{I_{k+1}}{I_c}<\idist{I_k}{I_c}$; since we additionally have
+ $(I_k,F_k) \nsol{I_0}{d} (I_{k+1},F_k)$, the inductive hypothesis
+ holds.
+
+ If instead $I_c \not \installs v'$ for all $v' \in S$, then since
+ $I_c$ is consistent, $I_c(\pkgof{v}) \neq v$. Let
+ $I_{k+1}=I_k;I_c(\pkgof{v})$ and $F_{k+1}=F_k \cup S$. $I_c \not
+ \installs v''$ for all $v'' \in S$ by definition and clearly
+ $\idist{I_{k+1}}{I_c}<\idist{I_k}{I_c}$. In addition, $I_k
+ \nsol{I_0}{d} I_{k+1}$ by Figure~\ref{fig:excludever}; therefore,
+ the inductive hypothesis holds.
+
+ Thus, the claim is established: such a sequence exists. Following
+ the logic of Theorem~\ref{thm:succ-sufficient}, we can see that for
+ $n=\idist{I_0}{I_c}$, $I_n$ is a consistent installation.
+ Furthermore, from the construction above, $I_n$ is a hybrid of $I_0$
+ and $I_c$. Thus, the theorem is established with $I_c'=I_n$.
+\end{proof}
+
+\subsection{Use Logical Necessity}
+
+In combination with the tracking of forbidden versions, it is also
+possible to detect \emph{forced installations} and \emph{essential
+ conflicts}. A forced installation is one which is logically
+necessary given $I$ and $F$: for instance, if we have $d=v -> \{v'_1,
+v'_2\}$, $I$ has touched $v$ (i.e., $I(v) \neq I_0$), $v'_1 \in F$,
+and $v'_2 \notin I_F$, then the only permissible successor given $d$
+is $(I;v'_1)$. An essential conflict is a dependency for which
+\emph{no} successors can be generated: for instance, if in the
+previous example we instead had $v'_2 \in F$, then $d$ would be an
+essential conflict.
+
+If any essential conflicts exist in an installation $I$, it is
+discarded immediately (rather than, for instance, generating
+successors for all the solvable dependencies). If any forced
+installations exist, they are accumulated and a successor formed by
+adding these installations to $I$ is placed into the open queue.
+
\section{Implementation}
A prototype implementation exists in the experimental branch of