[Aptitude-svn-commit] r3800 - in branches/aptitude-0.3/aptitude: .
src/generic/problemresolver
Daniel Burrows
dburrows at costa.debian.org
Thu Aug 11 00:55:26 UTC 2005
Author: dburrows
Date: Thu Aug 11 00:55:23 2005
New Revision: 3800
Modified:
branches/aptitude-0.3/aptitude/ChangeLog
branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex
Log:
Fix the algorithm's correctness proof, babble about efficiency.
Modified: branches/aptitude-0.3/aptitude/ChangeLog
==============================================================================
--- branches/aptitude-0.3/aptitude/ChangeLog (original)
+++ branches/aptitude-0.3/aptitude/ChangeLog Thu Aug 11 00:55:23 2005
@@ -2,6 +2,11 @@
* src/generic/problemresolver/model.tex:
+ Finish the correctness proof of the algorithm, start blathering
+ about efficiency.
+
+ * src/generic/problemresolver/model.tex:
+
Start describing how to eliminate "stupid pairs" from solutions,
including a preliminary writeup of the algorithm to use.
Modified: branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex
==============================================================================
--- branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex (original)
+++ branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex Thu Aug 11 00:55:23 2005
@@ -49,7 +49,7 @@
\newcommand{\nsol}[2]{\overset{#1}{\underset{#2}{\Rightarrow}}}
\newcommand{\nsolmany}[1]{\overset{#1}{\underset{*}{\Rightarrow}}}
-\newcommand{\act}[2]{\begin{array}{c} #1 \\ #2 \end{array}}
+\newcommand{\act}[2]{[#1 :: #2]}
\newcommand{\installs}{\vartriangleright}
\newcommand{\satisfies}{\vdash}
@@ -857,7 +857,28 @@
and $I_c$. Thus, the theorem is established with $I_c'=I_n$.
\end{proof}
-\subsection{Eliminating ``Stupidity''}
+\subsection{Use Logical Necessity}
+
+In combination with the tracking of forbidden versions, it is also
+possible to detect \emph{forced installations} and \emph{essential
+ conflicts}. A forced installation is one which is logically
+necessary given $I$ and $F$: for instance, if we have $d=v -> \{v'_1,
+v'_2\}$, $I$ has touched $v$ (i.e., $I(v) \neq I_0$), $v'_1 \in F$,
+and $v'_2 \notin I_F$, then the only permissible successor given $d$
+is $(I;v'_1)$. An essential conflict is a dependency for which
+\emph{no} successors can be generated: for instance, if in the
+previous example we instead had $v'_2 \in F$, then $d$ would be an
+essential conflict.
+
+If any essential conflicts exist in an installation $I$, it is
+discarded immediately (rather than, for instance, generating
+successors for all the solvable dependencies). If any forced
+installations exist, they are accumulated and a successor formed by
+adding these installations to $I$ is placed into the open queue.
+
+\section{Eliminating ``Stupidity''}
+
+\subsection{Overview}
The technique described in the previous section will eliminate some
obviously redundant solutions. But can we do better? In general, an
@@ -897,6 +918,8 @@
$\pkgof{v} = \pkgof{v'}$.} $d_2$ or $v_2$ solves $d_1$.
\end{definition}
+\subsection{The Elimination Algorithm}
+
Is there a way to manipulate the resolution sequence itself by
interchanging and/or dropping elements from the sequence so as to
eliminate all stupid pairs? The answer is ``yes''!
@@ -945,11 +968,11 @@
\State Let $I' = I_0;R_1';v_j;R_2';R_3'$
\If{$Broken(I') = \emptyset$}
\State Let $R' = R_1',\act{v_j}{d_i},\Call{ResolveFrom}{I_0;R_1';v_j,R_2' \cup R_3'}$
- \State \textbf{return} \Call{EliminateStupid}{$I_0$, $R'$}
+ \State \textbf{return} \Call{EliminateStupid}{$I_0$, $R'$} \label{line:eliminate-stupid-recur-1}
\Else
\State Take any dependency $d_k$ such that $I' \not \satisfies d_k$ and $v_i \satisfies d_k$.
- \State Let $R' = R_1',\call{ResolveFrom}{I_0;R_1',\{v_i,v_k\} \cup R_2' \cup R_3'}$
- \State \textbf{return} \Call{EliminateStupid}{$I_0$, $R'$}
+ \State Let $R' = R_1',\act{v_i}{d_k},R_2',\act{v_j}{d_j},R_3'$.
+ \State \textbf{return} \Call{EliminateStupid}{$I_0$, $R'$} \label{line:eliminate-stupid-recur-2}
\EndIf
\EndIf
\EndProcedure
@@ -994,7 +1017,7 @@
``sensible'' ordering of the solution and to drop unneeded versions
from the solution.
-The formal proof of correctness follows:
+\subsection{Correctness of \textsc{EliminateStupid}}
\begin{lemma}[Correctness of \textsc{ResolveFrom}]
If $Broken(I;S)=\emptyset$, $S$ is package-unique, and $v \notin I$
@@ -1067,7 +1090,8 @@
\begin{proof}
Proof is by induction on the pair $(l,n)$ where $l$ is the length of
$R$ and $n$ is the number of stupid pairs in $R$; these pairs are
- ordered lexicographically, so $(1,1) < (1,100) < (2,10) < (5,20)$.
+ ordered lexicographically, so, for instance, $(1,1) < (1,100) <
+ (2,10) < (5,20)$.
If $n = 0$, then the algorithm terminates immediately. In this
case, $R$ is a resolution sequence from the lemma prerequisite, and
@@ -1075,34 +1099,131 @@
If $l < 2$, then clearly $n = 0$ and the above reasoning holds.
Suppose instead that $n > 0$ and $l \geq 2$; i.e., there is at least
- one stupid pair in $R$. As in the algorithm, suppose that
+ one stupid pair in $R$; and suppose further that for all $(n',l') <
+ (n,l)$, the lemma holds. As in the algorithm, suppose that
$R_1',\act{v_i}{d_i},R_2',\act{v_j}{d_j},R_3'=R$, where
$\left(\act{v_i}{d_i},\act{v_j}{d_j}\right)$ is a stupid pair and no
element of $R_1'$ is implicated in a stupid pair; i.e.,
$\left(\act{v_i}{d_i},\act{v_j}{d_j}\right)$ is the ``first'' stupid
- pair in $R$.
-
- Let $I' = I_0;R_1';v_j;R_2';R_3'$. If
+ pair in $R$.\footnote{The astute reader will note that
+ $\act{v_i}{d_i}$ may be implicated in several distinct stupid
+ pairs. Assume for the time being that an arbitrary stupid pair
+ involving $\act{v_i}{d_i}$ is chosen.}
+
+ Let $I' = I_0;R_1';v_j;R_2';R_3'$.
+
+ If $I'$ is consistent -- i.e., $Broken(I')=\emptyset$, let $R' =
+ R_1',\act{v_j}{d_i},\call{ResolveFrom}{I_0;R_1';v_j,R_2' \cup
+ R_3'}$. By Lemma~\ref{lem:resolve-from-correct},
+ $\Call{ResolveFrom}{I_0;R_1';v_j,R_2' \cup R_3'}$ is a resolution
+ sequence for $I_0;R_1';v_j$ such that $Broken(I_0;R')=\emptyset$.
+ In addition, since each element of
+ $\call{ResolveFrom}{I_0;R_1';v_j,R_2' \cup R_3'}$ is a member of
+ $R_2' \cup R_3'$, the maximum length of $R'$ is
+ $\len{R_1'}+\len{\{v_j\}}+\len{R_2'}+\len{R_3'}=\len{R}-1$. Thus,
+ by the inductive hypothesis, $\Call{EliminateStupid}{I_0,R'}$ is a
+ resolution sequence for $I_0$ containing no stupid pairs, and hence
+ so is $\Call{EliminateStupid}{I_0,R}$.
+
+ On the other hand, if $I'$ is not consistent, consider any
+ dependency $d_k$ such that $I' \not \satisfies d_k$. Since
+ $I';v_i=I \satisfies d_k$, we must have $v_i \satisfies d_k$ and
+ hence $R'=R_1',\act{v_i}{d_k},R_2',\act{v_j}{d_j},R_3'$ is a
+ resolution sequence ($R'$ differs from $R$ only in that the
+ installation of $v_i$ is justified by reference to $d_k$ rather than
+ $d_i$); of course, since $I_0;R'=I_0;R$, $Broken(I_0;R')=\emptyset$.
+
+ I claim now that there are strictly fewer stupid pairs in $R'$ than
+ in $R$. First, note that $\act{v_i}{d_k}$ cannot be implicated in a
+ stupid pair: if it were, that would imply the existence of some
+ $\act{v_x}{d_x} \in R'$ such that $v_x \neq v_i$ and $v_x \satisfies
+ d_k$ -- but then $\act{v_x}{d_x} \in R$ and so we would have $I'
+ \satisfies d_k$, contradicting the choice of $d_k$. On the other
+ hand, as all other elements of $R'$ occur in $R$, the only stupid
+ pairs occuring among them also occur in $R$. Thus, $R'$ contains
+ exactly one less stupid pair than $R$; since $\len{R'}=\len{R}$, we
+ can apply the lemma inductively to conclude that
+ $\Call{EliminateStupid}{I_0,R'}$ is a resolution sequence $R''$ such
+ that $Broken(I_0;R'')=\emptyset$ and $R''$ contains no stupid pairs.
\end{proof}
-\subsection{Use Logical Necessity}
+\subsection{Efficiency of \textsc{EliminateStupid}}
+
+Of course, \textsc{EliminateStupid} is a fairly complex algorithm, and
+it would be nice to know how much we're paying for it. A full and
+careful analysis of the algorithm is beyond the scope of this paper,
+but this section gives a rough estimate of its worst case running
+time. As hinted at below, though, this really is just an estimate:
+the efficiency of the algorithm depends heavily on the operations used
+to implement its high-level directives.
+
+In the worst case, every single element of the incoming list is
+implicated in a stupid pair and the last stupid pair decreases the
+list length by one and creates a list full of stupid pairs; thus, we
+would expect $O(\len{R}^2)$ recursions of \textsc{EliminateStupid}.
+Each call to \textsc{ResolveFrom} clearly takes $O(\len{S})$ time, so
+even if all operations mentioned in these routines are $O(1)$, the
+worst-case running time of \textsc{EliminateStupid} is about
+$O(\len{R}^3)$. In fact, however, these operations are not $O(1)$.
+Each recursion of \textsc{ResolveFrom} invokes the following
+operations:
+
+\begin{itemize}
+\item $Broken(I)$ to test whether broken packages exist and to fetch
+ an arbitrary broken package. This operation is $O(n)$ in the number
+ of package versions, or $O(\len{I})$.
+
+\item Finding a solution of $d$ in $S$. A reasonable approach is to
+ iterate over the possible solutions of $d$, testing each to see if
+ it is in $S$. If $S$ is represented by a hash table, the expected
+ efficiency of this step will be $O(\len{d})$; the current
+ implementation uses balanced trees, giving a worst-case efficiency
+ of $O(\len{d}\log \len{S})$ where $\len{d}$ is the number of possible
+ solutions $d$\footnote{This is not quite an accurate count of the
+ complexity of testing whether $d$ is satisfied, but it is
+ reasonably close to correct; in practice, $\len{d}$ is usually
+ very small.}.
+
+\item Some number of ``uninteresting'' constant-time operations.
+\end{itemize}
+
+Thus, the expected running time of $\Call{ResolveFrom}{I,S}$ is
+$O(\len{S}(\len{I}+\len{d})$ where $\len{d}$ represents the largest
+degree of any dependency. Similarly, the following atomic operations
+are performed in \textsc{EliminateStupid}:
+
+\begin{itemize}
+\item Checking for stupid pairs.
+
+ This might appear to be an $O(n^2)$ operation. Luckily, by using an
+ associative data structure (i.e., a hash table), the time complexity
+ of this step can be reduced to $O(n\len{d})$. The procedure is to
+ create a structure mapping versions to sets of versions, then scan
+ through the input sequence. For each entry $\act{v}{d}$, scan
+ through the list of versions that can ``solve'' $d$; for all such
+ $v'$ other than $v$, add $v$ to the set associated with $v'$. When
+ this step is complete, each mapping $v'' \mapsto S$ describes a set
+ of stupid pairs $\{(v'',v''') \st v''' \in S\}$.
+
+ It is probably possible to accelerate this step further in the
+ recursive computation by re-using the computations from the previous
+ step. I have not yet done this, as the current algorithm is simpler
+ and is fast enough in practice.
+
+\item Constructing $I'$ and $R'$, both of which require $O(\len{R})$
+ steps.
+
+\item Testing whether $I'$ has broken dependencies; as before, this
+ requires $O(\len{I'})=O(\len{I_0})$ time.
+
+\item Calls to \textsc{ResolveFrom}; as noted above, these require
+ about $O(\len{I_0}+\len{R})$ time.
+
+\item Recursive calls to \textsc{ElminateStupid}, requiring about
+ $O(\len{R}^2(\len{R}+\len{I_0}))$ time.
+\end{itemize}
-In combination with the tracking of forbidden versions, it is also
-possible to detect \emph{forced installations} and \emph{essential
- conflicts}. A forced installation is one which is logically
-necessary given $I$ and $F$: for instance, if we have $d=v -> \{v'_1,
-v'_2\}$, $I$ has touched $v$ (i.e., $I(v) \neq I_0$), $v'_1 \in F$,
-and $v'_2 \notin I_F$, then the only permissible successor given $d$
-is $(I;v'_1)$. An essential conflict is a dependency for which
-\emph{no} successors can be generated: for instance, if in the
-previous example we instead had $v'_2 \in F$, then $d$ would be an
-essential conflict.
-If any essential conflicts exist in an installation $I$, it is
-discarded immediately (rather than, for instance, generating
-successors for all the solvable dependencies). If any forced
-installations exist, they are accumulated and a successor formed by
-adding these installations to $I$ is placed into the open queue.
\section{Non-mandatory Dependencies}
More information about the Aptitude-svn-commit
mailing list