[Aptitude-svn-commit] r3800 - in branches/aptitude-0.3/aptitude: . src/generic/problemresolver

Thu Aug 11 00:55:26 UTC 2005

Author: dburrows
Date: Thu Aug 11 00:55:23 2005
New Revision: 3800

Modified:
   branches/aptitude-0.3/aptitude/ChangeLog
   branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex
Log:
Fix the algorithm's correctness proof, babble about efficiency.

Modified: branches/aptitude-0.3/aptitude/ChangeLog
==============================================================================

--- branches/aptitude-0.3/aptitude/ChangeLog	(original)
+++ branches/aptitude-0.3/aptitude/ChangeLog	Thu Aug 11 00:55:23 2005
@@ -2,6 +2,11 @@
 
 	* src/generic/problemresolver/model.tex:
 
+	  Finish the correctness proof of the algorithm, start blathering
+	  about efficiency.
+
+	* src/generic/problemresolver/model.tex:
+
 	  Start describing how to eliminate "stupid pairs" from solutions,
 	  including a preliminary writeup of the algorithm to use.
 

Modified: branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex
==============================================================================
--- branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex	(original)
+++ branches/aptitude-0.3/aptitude/src/generic/problemresolver/model.tex	Thu Aug 11 00:55:23 2005
@@ -49,7 +49,7 @@
 \newcommand{\nsol}[2]{\overset{#1}{\underset{#2}{\Rightarrow}}}
 \newcommand{\nsolmany}[1]{\overset{#1}{\underset{*}{\Rightarrow}}}
 
-\newcommand{\act}[2]{\begin{array}{c} #1 \\ #2 \end{array}}
+\newcommand{\act}[2]{[#1 :: #2]}
 
 \newcommand{\installs}{\vartriangleright}
 \newcommand{\satisfies}{\vdash}
@@ -857,7 +857,28 @@
   and $I_c$.  Thus, the theorem is established with $I_c'=I_n$.
 \end{proof}
 
-\subsection{Eliminating ``Stupidity''}
+\subsection{Use Logical Necessity}
+
+In combination with the tracking of forbidden versions, it is also
+possible to detect \emph{forced installations} and \emph{essential
+  conflicts}.  A forced installation is one which is logically
+necessary given $I$ and $F$: for instance, if we have $d=v -> \{v'_1,
+v'_2\}$, $I$ has touched $v$ (i.e., $I(v) \neq I_0$), $v'_1 \in F$,
+and $v'_2 \notin I_F$, then the only permissible successor given $d$
+is $(I;v'_1)$.  An essential conflict is a dependency for which
+\emph{no} successors can be generated: for instance, if in the
+previous example we instead had $v'_2 \in F$, then $d$ would be an
+essential conflict.
+
+If any essential conflicts exist in an installation $I$, it is
+discarded immediately (rather than, for instance, generating
+successors for all the solvable dependencies).  If any forced
+installations exist, they are accumulated and a successor formed by
+adding these installations to $I$ is placed into the open queue.
+
+\section{Eliminating ``Stupidity''}
+
+\subsection{Overview}
 
 The technique described in the previous section will eliminate some
 obviously redundant solutions.  But can we do better?  In general, an
@@ -897,6 +918,8 @@
     $\pkgof{v} = \pkgof{v'}$.}  $d_2$ or $v_2$ solves $d_1$.
 \end{definition}
 
+\subsection{The Elimination Algorithm}
+
 Is there a way to manipulate the resolution sequence itself by
 interchanging and/or dropping elements from the sequence so as to
 eliminate all stupid pairs?  The answer is ``yes''!
@@ -945,11 +968,11 @@
     \State Let $I' = I_0;R_1';v_j;R_2';R_3'$
     \If{$Broken(I') = \emptyset$}
       \State Let $R' = R_1',\act{v_j}{d_i},\Call{ResolveFrom}{I_0;R_1';v_j,R_2' \cup R_3'}$
-      \State \textbf{return} \Call{EliminateStupid}{$I_0$, $R'$}
+      \State \textbf{return} \Call{EliminateStupid}{$I_0$, $R'$} \label{line:eliminate-stupid-recur-1}
     \Else
       \State Take any dependency $d_k$ such that $I' \not \satisfies d_k$ and $v_i \satisfies d_k$.
-      \State Let $R' = R_1',\call{ResolveFrom}{I_0;R_1',\{v_i,v_k\} \cup R_2' \cup R_3'}$
-      \State \textbf{return} \Call{EliminateStupid}{$I_0$, $R'$}
+      \State Let $R' = R_1',\act{v_i}{d_k},R_2',\act{v_j}{d_j},R_3'$.
+      \State \textbf{return} \Call{EliminateStupid}{$I_0$, $R'$} \label{line:eliminate-stupid-recur-2}
     \EndIf
   \EndIf
 \EndProcedure
@@ -994,7 +1017,7 @@
 ``sensible'' ordering of the solution and to drop unneeded versions
 from the solution.
 
-The formal proof of correctness follows:
+\subsection{Correctness of \textsc{EliminateStupid}}
 
 \begin{lemma}[Correctness of \textsc{ResolveFrom}]
   If $Broken(I;S)=\emptyset$, $S$ is package-unique, and $v \notin I$
@@ -1067,7 +1090,8 @@
 \begin{proof}
   Proof is by induction on the pair $(l,n)$ where $l$ is the length of
   $R$ and $n$ is the number of stupid pairs in $R$; these pairs are
-  ordered lexicographically, so $(1,1) < (1,100) < (2,10) < (5,20)$.
+  ordered lexicographically, so, for instance, $(1,1) < (1,100) <
+  (2,10) < (5,20)$.
 
   If $n = 0$, then the algorithm terminates immediately.  In this
   case, $R$ is a resolution sequence from the lemma prerequisite, and
@@ -1075,34 +1099,131 @@
   If $l < 2$, then clearly $n = 0$ and the above reasoning holds.
 
   Suppose instead that $n > 0$ and $l \geq 2$; i.e., there is at least
-  one stupid pair in $R$.  As in the algorithm, suppose that
+  one stupid pair in $R$; and suppose further that for all $(n',l') <
+  (n,l)$, the lemma holds.  As in the algorithm, suppose that
   $R_1',\act{v_i}{d_i},R_2',\act{v_j}{d_j},R_3'=R$, where
   $\left(\act{v_i}{d_i},\act{v_j}{d_j}\right)$ is a stupid pair and no
   element of $R_1'$ is implicated in a stupid pair; i.e.,
   $\left(\act{v_i}{d_i},\act{v_j}{d_j}\right)$ is the ``first'' stupid
-  pair in $R$.
-
-  Let $I' = I_0;R_1';v_j;R_2';R_3'$.  If 
+  pair in $R$.\footnote{The astute reader will note that
+    $\act{v_i}{d_i}$ may be implicated in several distinct stupid
+    pairs.  Assume for the time being that an arbitrary stupid pair
+    involving $\act{v_i}{d_i}$ is chosen.}
+
+  Let $I' = I_0;R_1';v_j;R_2';R_3'$.
+
+  If $I'$ is consistent -- i.e., $Broken(I')=\emptyset$, let $R' =
+  R_1',\act{v_j}{d_i},\call{ResolveFrom}{I_0;R_1';v_j,R_2' \cup
+    R_3'}$. By Lemma~\ref{lem:resolve-from-correct},
+  $\Call{ResolveFrom}{I_0;R_1';v_j,R_2' \cup R_3'}$ is a resolution
+  sequence for $I_0;R_1';v_j$ such that $Broken(I_0;R')=\emptyset$.
+  In addition, since each element of
+  $\call{ResolveFrom}{I_0;R_1';v_j,R_2' \cup R_3'}$ is a member of
+  $R_2' \cup R_3'$, the maximum length of $R'$ is
+  $\len{R_1'}+\len{\{v_j\}}+\len{R_2'}+\len{R_3'}=\len{R}-1$.  Thus,
+  by the inductive hypothesis, $\Call{EliminateStupid}{I_0,R'}$ is a
+  resolution sequence for $I_0$ containing no stupid pairs, and hence
+  so is $\Call{EliminateStupid}{I_0,R}$.
+
+  On the other hand, if $I'$ is not consistent, consider any
+  dependency $d_k$ such that $I' \not \satisfies d_k$.  Since
+  $I';v_i=I \satisfies d_k$, we must have $v_i \satisfies d_k$ and
+  hence $R'=R_1',\act{v_i}{d_k},R_2',\act{v_j}{d_j},R_3'$ is a
+  resolution sequence ($R'$ differs from $R$ only in that the
+  installation of $v_i$ is justified by reference to $d_k$ rather than
+  $d_i$); of course, since $I_0;R'=I_0;R$, $Broken(I_0;R')=\emptyset$.
+
+  I claim now that there are strictly fewer stupid pairs in $R'$ than
+  in $R$.  First, note that $\act{v_i}{d_k}$ cannot be implicated in a
+  stupid pair: if it were, that would imply the existence of some
+  $\act{v_x}{d_x} \in R'$ such that $v_x \neq v_i$ and $v_x \satisfies
+  d_k$ -- but then $\act{v_x}{d_x} \in R$ and so we would have $I'
+  \satisfies d_k$, contradicting the choice of $d_k$.  On the other
+  hand, as all other elements of $R'$ occur in $R$, the only stupid
+  pairs occuring among them also occur in $R$.  Thus, $R'$ contains
+  exactly one less stupid pair than $R$; since $\len{R'}=\len{R}$, we
+  can apply the lemma inductively to conclude that
+  $\Call{EliminateStupid}{I_0,R'}$ is a resolution sequence $R''$ such
+  that $Broken(I_0;R'')=\emptyset$ and $R''$ contains no stupid pairs.
 \end{proof}
 
-\subsection{Use Logical Necessity}
+\subsection{Efficiency of \textsc{EliminateStupid}}
+
+Of course, \textsc{EliminateStupid} is a fairly complex algorithm, and
+it would be nice to know how much we're paying for it.  A full and
+careful analysis of the algorithm is beyond the scope of this paper,
+but this section gives a rough estimate of its worst case running
+time.  As hinted at below, though, this really is just an estimate:
+the efficiency of the algorithm depends heavily on the operations used
+to implement its high-level directives.
+
+In the worst case, every single element of the incoming list is
+implicated in a stupid pair and the last stupid pair decreases the
+list length by one and creates a list full of stupid pairs; thus, we
+would expect $O(\len{R}^2)$ recursions of \textsc{EliminateStupid}.
+Each call to \textsc{ResolveFrom} clearly takes $O(\len{S})$ time, so
+even if all operations mentioned in these routines are $O(1)$, the
+worst-case running time of \textsc{EliminateStupid} is about
+$O(\len{R}^3)$.  In fact, however, these operations are not $O(1)$.
+Each recursion of \textsc{ResolveFrom} invokes the following
+operations:
+
+\begin{itemize}
+\item $Broken(I)$ to test whether broken packages exist and to fetch
+  an arbitrary broken package.  This operation is $O(n)$ in the number
+  of package versions, or $O(\len{I})$.
+
+\item Finding a solution of $d$ in $S$.  A reasonable approach is to
+  iterate over the possible solutions of $d$, testing each to see if
+  it is in $S$.  If $S$ is represented by a hash table, the expected
+  efficiency of this step will be $O(\len{d})$; the current
+  implementation uses balanced trees, giving a worst-case efficiency
+  of $O(\len{d}\log \len{S})$ where $\len{d}$ is the number of possible
+  solutions $d$\footnote{This is not quite an accurate count of the
+    complexity of testing whether $d$ is satisfied, but it is
+    reasonably close to correct; in practice, $\len{d}$ is usually
+    very small.}.
+
+\item Some number of ``uninteresting'' constant-time operations.
+\end{itemize}
+
+Thus, the expected running time of $\Call{ResolveFrom}{I,S}$ is
+$O(\len{S}(\len{I}+\len{d})$ where $\len{d}$ represents the largest
+degree of any dependency.  Similarly, the following atomic operations
+are performed in \textsc{EliminateStupid}:
+
+\begin{itemize}
+\item Checking for stupid pairs.
+
+  This might appear to be an $O(n^2)$ operation.  Luckily, by using an
+  associative data structure (i.e., a hash table), the time complexity
+  of this step can be reduced to $O(n\len{d})$.  The procedure is to
+  create a structure mapping versions to sets of versions, then scan
+  through the input sequence.  For each entry $\act{v}{d}$, scan
+  through the list of versions that can ``solve'' $d$; for all such
+  $v'$ other than $v$, add $v$ to the set associated with $v'$.  When
+  this step is complete, each mapping $v'' \mapsto S$ describes a set
+  of stupid pairs $\{(v'',v''') \st v''' \in S\}$.
+
+  It is probably possible to accelerate this step further in the
+  recursive computation by re-using the computations from the previous
+  step.  I have not yet done this, as the current algorithm is simpler
+  and is fast enough in practice.
+
+\item Constructing $I'$ and $R'$, both of which require $O(\len{R})$
+  steps.
+
+\item Testing whether $I'$ has broken dependencies; as before, this
+  requires $O(\len{I'})=O(\len{I_0})$ time.
+
+\item Calls to \textsc{ResolveFrom}; as noted above, these require
+  about $O(\len{I_0}+\len{R})$ time.
+
+\item Recursive calls to \textsc{ElminateStupid}, requiring about
+  $O(\len{R}^2(\len{R}+\len{I_0}))$ time.
+\end{itemize}
 
-In combination with the tracking of forbidden versions, it is also
-possible to detect \emph{forced installations} and \emph{essential
-  conflicts}.  A forced installation is one which is logically
-necessary given $I$ and $F$: for instance, if we have $d=v -> \{v'_1,
-v'_2\}$, $I$ has touched $v$ (i.e., $I(v) \neq I_0$), $v'_1 \in F$,
-and $v'_2 \notin I_F$, then the only permissible successor given $d$
-is $(I;v'_1)$.  An essential conflict is a dependency for which
-\emph{no} successors can be generated: for instance, if in the
-previous example we instead had $v'_2 \in F$, then $d$ would be an
-essential conflict.
 
-If any essential conflicts exist in an installation $I$, it is
-discarded immediately (rather than, for instance, generating
-successors for all the solvable dependencies).  If any forced
-installations exist, they are accumulated and a successor formed by
-adding these installations to $I$ is placed into the open queue.
 
 \section{Non-mandatory Dependencies}