[Debtorrent-devel] Fwd: BitTorrent Protocol Expansion (Google SoC)

Cameron Dale camrdale at gmail.com
Fri Apr 13 01:50:20 UTC 2007


---------- Forwarded message ----------
From: Cameron Dale <camrdale at gmail.com>
Date: Apr 7, 2007 6:58 PM
Subject: Re: BitTorrent Protocol Expansion (Google SoC)
To: Anthony Towns <aj at azure.humbug.org.au>


On 4/6/07, Anthony Towns <aj at azure.humbug.org.au> wrote:
> On Fri, Apr 06, 2007 at 11:08:29AM -0700, Cameron Dale wrote:
> > My idea for this proposal is that you don't have a new torrent every day,
>
> Is that necessarily desireable? Wouldn't it cause problems if you're on
> the i'th Packages file, but most of the peers in the torrent are on the
> j'th, so when you connect to a random peer, and try to download a piece
> of a package that been updated between i and j, they're unlikely to have
> it (or want it)?

It's the only way (that I can see) to solve the problem of "all
downloaders of a package need to be aware of all other downloaders"
and therefore get the desired efficiency from the download. I think
though, that if people update as much as you say they do below, then
someone on the i'th Packages file is going to have a large number of
packages in common with the peers on the j'th packages file, provided
(i-j) is not too large, since not much of the archive changes on a
daily basis.

> Well, don't forget that we're updating the Packages file twice a day
> at the moment, and expecting that to increase. Ubuntu does it (up to)
> 48 times a day, iirc, and we'd certainly like to consider between 4 and
> 12 times a day.

All the more reason not to make new torrents every time. In my
solution, it doesn't matter how many times you update the Packages
file in a day, only how many packages within the file are updated and
therefore need new piece numbers (unless you updated the same package
more than a couple of times in one day, which I see as pretty rare).

> I'm not sure 100 days ago is interesting anyway though. Presumably you can
> expect to mostly have a few different likely frequencies for testing/unstable
> users:
>
>         * obsessive-compulsive updating of every change, ASAP
>         * once every 12, 24 or 48 hours
>         * once or twice a week
>
> I wouldn't really expect it to be all that interesting to worry much beyond
> that frequency -- you're not going to have enough peers that low for it to be
> interesting, afaics.

I'm quite a bit behind these myself, as I have a testing box I
probably only update the Packages file of about once a month, and an
unstable box I update maybe once every 2 weeks. I could be the
minority though, I'm not sure. Some numbers on this would be nice, but
I have no idea where we could get them from.

> But even so, at once a week with 12 times a day, you're potentially up
> to 84 different Packages files anyway (though currently no more than 14).

As I said above, this makes it even more beneficial to not have a new
torrent every time, as each torrent will have a smaller number of
peers in it as the number of torrents increases. Then it could be
really difficult to find a peer in your torrent with the piece/package
you're looking for, especially if that piece is not one of the most
popular ones.

> Another aspect: if you're trying to share amongst all those peers, you don't
> actually have to participate in all the torrents. You can do it indirectly
> instead:
>
>              torrent a           torrent b
> Peer 1           Y
> Peer 2           Y                  Y
> Peer 3                              Y
>
> So if Peer 1 has a piece that's common across both torrents, Peer 2 will
> get it via torrent a, then be able to share it with Peer 3 via torrent b.

Yes, but the chances of that happening are slim, as it needs to be a
package that Peer 2 wants as well. Add in 84 possible torrents, and
the chances decrease even more. I don't think it's something we could
depend on as being possible.

> > Whether you call it a torrent or something else, you need a giant swarm of peers
> > all talking to each other, no matter what day they downloaded the Packages file
> > on, since they will have something like 90% of the pieces in common. And when
> > one says to another, I have package foo, they need to know they're talking about
> > the same version. Since peers within a torrent only exchange information in the
> > form of piece numbers, these need to uniquely identify a package AND version.
>
> Treating the path in the pool (pool/main/g/gamin/libgamin0_0.1.7-4_powerpc.deb)
> as unique-per-file should be fine for that in almost all cases, fwiw.

If you mean communicating the path as the unique piece identifier,
then this is the same as using the SHA1 hash of the piece as the piece
number, instead of using some kind of sequential piece numbering. It
therefore prevents you from transmitting or storing piece information
as bitfields, which I thought we agreed was a necessary thing.

Unless you're suggesting another layer of communication, in which
peer's that think they have the same piece communicate the path of the
piece so they can confirm that it is in fact the same version as well.
I think this adds too much communication and complexity though for it
to be feasible.

> The overall lifespan of etch looks like:
>
>         22 months as testing
>         18 (?) months as stable
>         18 months as oldstable (with security support and possibly point updates)
>
> That's 58 months or just under five years, longer if lenny takes more
> than 18 months to release, and also if security support gets extended.

I'm not too worried about the length of time as stable or for security
support, as the number of updates will be quite small. The 22 months
as testing is worrisome, as it may lead to a large number of unique
piece numbers being needed. It has occurred to me too that this will
lead to the unique numbers being quite large by the time the version
gets released, which could make the sharing that goes on while the
release is stable less efficient due to the long bitfields
transmitted. Again, some compression could help though.

> Note that the "testing" lifetime has the suite's pieces vary from an
> exact match of one stable release to an exact match of the next stable
> release; which is usually a pretty major variation.

Right, so testing's unique piece number's would reset when the
distribution was released, as it would now be a new torrent. The steps
required for this to happen automatically for someone tracking testing
are something to think about though. I expect they would just drop the
old torrent and get the new one. Maybe some delay before this happens
would be useful though, as they have a lot of the pieces needed for
people installing the new stable version, and if they could share them
it would definitely help. I'm not sure if this is possible though.

> sid and experimental don't have a defined endpoint; I'm not sure what
> you'd want to do about them. I'm not sure what (if anything) you'd do
> when a new suite (like lenny) gets introduced either.

I'm not sure what to do with sid and experimental either.

> Yup, we got word a couple of hours ago, only nine slots. :(

That's low. I was definitely expecting more. Any idea you can give me
on this project's chances?

> > I'll update my proposal with some of these unique piece number ideas, as I am
> > starting to think that is the way to go. Thanks for all the thoughts and ideas.
>
> Sweet. Worth adding a comment to your app with a pointer to it.

I'll do that.

> Daniel Burrows <dburrows at debian.org> and Michael Vogt <mvo at debian.org>
> have both offered to help (particularly with apt-acquire code if
> necessary) in the private comments on your app btw.

Great! I see they're strong in the apt department, which will be
useful as it is my weakest point in this project.

I'm having some trouble judging the tone of your emails sometimes. We
seem to be going back and forth a lot on the same issue, and I'm not
sure if it's because you really dislike my proposal, don't understand
it, or are just trying to generate discussion (maybe as a test?).
Anyway, just trying to avoid any confusion and figure out where we're
going with this, no worries either way.

Cameron



More information about the Debtorrent-devel mailing list