[Debtorrent-devel] Fwd: BitTorrent Protocol Expansion (Google SoC)

Cameron Dale camrdale at gmail.com
Fri Apr 13 01:49:15 UTC 2007


---------- Forwarded message ----------
From: Anthony Towns <aj at azure.humbug.org.au>
Date: Apr 5, 2007 11:34 PM
Subject: Re: BitTorrent Protocol Expansion (Google SoC)
To: Cameron Dale <camrdale at gmail.com>


On Thu, Apr 05, 2007 at 02:23:43PM -0700, Cameron Dale wrote:
> >> "Proposal A"
> >> "communicate all torrent information in the Packages file"
> >> "pieces can no longer be numbered"
> I think a same-ordered Packages file is something we definitely can't
> depend on. If I download the Packages file today and download bar, then
> bar updates overnight, and tomorrow you download the new Packages file
> and try to download the new bar from me because I tell you I have piece
> 6, there's going to be problems.

Right, but pieces are related to a torrent, and if you're looking
at tomorrow's Packages files, that's a different torrent anyway. The
simplest example there might as well be:

        Package: bar
        Version: 2.3
        Size: 1023

        Package: foo
        Version: 1.0
        Size: 2045

then:

        Package: bar
        Version: 2.3
        Size: 1023

        Package: baz
        Version: 4.5
        Size: 518

        Package: foo
        Version: 1.0
        Size: 2045

In which case if you want/have "foo 1.0" you're talking about piece 3
on day two, instead of piece 2 on day one. (Packages files are almost always
alphabetical by package name, fwiw)

But if you're treating that as two different torrents (ie, a torrent
per Packages file), then you're just participating in two torrents that
share a couple of files on disk.

That's probably necessary anyway, in order to treat testing and unstable
as separate torrents, I guess; and possibly testing/i386 and testing/amd64
depending on how arch:all packages get treated.

> Things get even worse if a package
> grows in size and requires a new piece, then all packages after it in
> the Packages file are offset by one.

Right, but I'd view that as a different Packages file, by definition. That
way you can just use the sha1sum the uncompressed Packages file to
identify the torrent, whether you get it from http://ftp.debian.org/,
or a local mirror, or put it together from diffs or whatever.

> Since these numbers are kind of arbitrary, I was suggesting we just do
> away with them and instead use the SHA1 as the piece number. Then
> instead of saying "I have piece 18001" you would say "I have piece
> 9425fa8de16f6283365f6bee87f405da16a203e6". The only reason to have piece
> numbers that I can see is for the BITFIELD communication (more below on
> that).

There's that, which I definitely agree on, and there's also that it
might make implementation easier, since piece numbering is presumably
a pretty fundamental assumption, that it'll be awkward to break.

> BITFIELD is useful though, so maybe numbering should be kept. The key is
> whether it can be unique and not grow to too large a size. Some more
> statistics are needed here, ;) such as how many new piece numbers would
> we need per day.

It depends what you're treating as your "torrent". You can look at the
"Version:" lines in dists/sid/main/binary-i386/Packages.diff/*.gz (eg)
to get some idea of how many updates there are -- by the looks of things
it's anywhere from 12 to 223 per half-day, or about 122 per day averaged
over the past week, which is maybe 43k per year, and an increase in the
size of the bitfield by about 5kB each year. Of course we're in the last
stages of a freeze at the moment, so that could be a wild underestimate.

> >> "how do peers communicate BITFIELD information of all the pieces they
> >>  have when the pieces are no longer numbered"
> > Probably by sending a lot of "HAVE ..." notices?
> Yeah, like a few thousand, every time you connect to a new peer? That
> would probably mean a lot of wasted bandwidth and connections. If we use
> the unique piece numbering I mentioned above, it could also grow to the
> point where we have bitfields that are too long. BitTorrent currently
> sends bitfields as "001010..." indicating that it has pieces 2 and 4,
> but needs 0,1,3,5. So, our bitfields could grow to the point of being a
> 100 KB in length or much more. It should be pretty easy to convert this
> bit string into a more efficient Hex or even binary representation to
> save on communication costs though.

Hrm? I thought it was already a real bitfield? "The first byte in
the bitfield corresponds to indices 0-7 from high bit to low bit,
respectively" -- www.bittorrent.org/protocol.html

There's currently about 20k packages in sid/binary-i386 (including
arch:all) which makes the bitfield 2.5kB, afaics.

Oh, other consideration: the client needs to keep track of availability
for all its peers in memory, being able to use bitfields for that might
still be worthwhile, even if they can't be in the protocol per se.

> Just so you know, [...] Just something to think about.

Well, hopefully tomorrow we find out slot numbers so we really can think
about this stuff. :)

Cheers,
aj


-----BEGIN PGP SIGNATURE-----

iD8DBQFGFendOxe8dCpOPqoRAioTAJ9MzLLkhwLVSF73PkdlZEXMdpFpIQCfdWBZ
451CSAoIWLIp6MIy21qAojc=
=Wlww
-----END PGP SIGNATURE-----



More information about the Debtorrent-devel mailing list