[cut-team] Ideas for the rolling release

Mon Aug 16 19:16:31 UTC 2010

On Mon, Aug 16, 2010 at 02:42, Raphael Hertzog <hertzog at debian.org> wrote:
> I am one of those that believe that testing is mostly usable but we have a
> bunch of DD who thinks the opposite and who regularly tell users that
> testing is not usable. I think "rolling" is trying to fix some of the
> perceived problems that lead those developers to make those claim.

Well, what are the reasons for those claims? For me (as one of the DDs
who'll tell people not to use testing), the lack of an installer, and
the lack of specific security support are my major concerns. I think
snapshots would alleviate the installability issue; and either
security updates for the snapshots or possibly just fixing my
understanding of security support for testing would resolve the other
issue.

I /think/ Lucas's concern is more along the lines of "it's perfectly
usable, but it's boring because neat new stuff takes too long to get
there".

I /think/ Drake's concern (based on what he's working on) is along the
lines of "it's usually usable, but every now and then the release team
break it to get a transition through or whatever".

>> (If you've got some metric by which testing is lacking, it would be
>> very interesting to (a) monitor it, and (b) talk to the release team
>> about getting britney to automatically consider it when doing updates.
>> But obviously, that only works if it's a measurable problem, not just
>> an anecdotal one)
> AFAIK the release team is not working much on improving britney (and I
> don't blame them, I guess it's not an easy task). I doubt that coming up
> with ideas is enough, working code is probably better and for that
> a suite where we can experiment is probably worth it in the start.

Well, I expect that they'd accept patches, and I'm pretty sure I could
still put some together -- I just don't have any ideas on new metrics
to apply to testing that would be particularly
interesting/informative/whatever.

Having a new suite is fine -- but without the metrics how do you tell
it's actually doing better? How will you show people it's not just a
matter of it being your baby, and therefore you think it's the cutest
thing ever?

(For testing, we had the criteria of RC bug count and individual
package installability, you can compare the green and red lines on
bugs.debian.org/release-critical to see how bug count worked out, and
britney generates uninstallability counts and puts them under
http://release.debian.org/britney/ so you can track the other)

> Why is t-p-u not used much?

I'd say because the release team don't accept packages from t-p-u very often.

> - It's a lot of work to prepare those uploads and you're not sure of the
>  resulting dependencies without manual inspection.
>  => what about auto-building sid in testing (when versions differ) and
>     have the resulting suite available to britney?

In that case you run straight into the next problem you cite:

> - those packages have not been tested in sid

The following idea:

>  => encourage some testing users to run with the sid-built-in-testing
>     activated and consider those packages only after a delay (somewhat
>     longer than the usual unstable delay to favor the binaries from
>     unstable first)

is just as easy to deal with for t-p-u -- encourage testing users to
have t-p-u in their sources.list. If it's handled the same way as p-u
(that is, stuff gets queued for approval by the release team before
being published in p-u on the ftp site), that seems reasonably
plausible.

I just don't think the release team sees a need here, which makes me
wonder if there actually is one, which in turn makes me look for
metrics...

> That's what I thought too. I wanted to write it down but ended up not
> doing it because in the end it's worth to think about why the freeze would
> not be the perfect time for the ports to catch up...

The main reason is that upstream often doesn't care about ports, and
if new features don't work on ports, the only time people start caring
is when Debian users file RC bugs, which the release team prods
maintainers about, which results in patches upstream. If that gets
delayed by months because the porters are happy staying a bit out of
date for those packages, upstream is going to diverge further, and
finding a fix is going to be that much harder.

The sarge release cycle had some major problems with glibc on some
architectures; those took ages to get fixed (about 7 months) even with
regular attention from the RM at the time, and caused lots of delays
of exactly the sort you're worrying about [0]. But allowing that to
fester would have meant two things: taking even longer to get it fixed
(and if done during the freeze, blocking uploads of other packages
during that time), and risking introducing packages into testing that
end up being incompatible with the eventual fix.

[0] http://lists.debian.org/debian-devel-announce/2003/03/msg00006.html

> We're slowed down by various ports

I don't think that claim is actually true these days (or to be more
precise, I don't think it's significant; I'm sure some packages get a
few days extra delay every now and then, but that's pretty minor
compared to a kernel that's been out of date for almost six months,
eg).

Happy to be persuaded otherwise by actual metrics, though.

>> Hypothetically, I think some of those might even be good ideas; but in
>> practice the liklihood that it just confuses developers and makes the
>> release team's job harder still seems to outweigh the benefits to me
>> (adding a suite or changing testing are both likely to be confusing,
>> in different ways).
> Huh. Wasn't testing confusing when we introduced it?

Sure -- see [1] for what I thought it took to explain it before it was
rolled out. It also took two and a half years of discussion before it
got into the archive [2]-[3], and that was with fewer packages and
architectures to deal with than today, and a very clear separation
between stable, testing and unstable from day one, and metrics
demonstrating the benefits of the approach [4, [5]].

[1] http://lists.debian.org/debian-devel/2000/08/msg00906.html
[2] http://lists.debian.org/debian-devel/1998/05/msg01695.html
[3] http://lists.debian.org/debian-devel-announce/2000/12/msg00011.html
[4] http://lists.debian.org/debian-devel/2000/04/msg00800.html
[5] http://lists.debian.org/debian-devel/2000/05/msg00352.html

> This is a rather poor justification for not doing anything.

It's not a reason to never do anything, it's a reason to first make
the benefits really clear, and to provide a lot of explanation so it's
clear we know what's going on.

To emphasise: I do think testing can be improved in some way here, I
just don't know what that way is, or how it could realistically
happen. I'm only being difficult because (a) I think if you can
convince me you'll have better arguments to convince everyone else,
and (b) I won't be able to help until I understand what the exact
problem to be solved is.

Cheers,
aj

-- 
Anthony Towns <aj at erisian.com.au>