[Debtorrent-devel] Fwd: BitTorrent Protocol Expansion (Google SoC)
Cameron Dale
camrdale at gmail.com
Fri Apr 13 01:48:22 UTC 2007
---------- Forwarded message ----------
From: Anthony Towns <aj at azure.humbug.org.au>
Date: Apr 2, 2007 1:57 AM
Subject: Re: BitTorrent Protocol Expansion (Google SoC)
To: Cameron Dale <camrdale at gmail.com>
On Sun, Apr 01, 2007 at 02:19:27PM -0700, Cameron Dale wrote:
> http://wiki.debian.org/AptBittorrent
Sweet. Some comments:
> "a lot of packages are too small"
I think I did some stats a while ago trying to get a handle on this
to work out piece sizes. No idea what I did with the data then, but
redoing it now seems straightforward. If I use /var/lib/dpkg/available
(which is in Packages file format):
$ sed < /var/lib/dpkg/available -ne 's/^Size: //p' | sort -n > foo.csv
and run that through gnumeric's "statistical analysis" stuff, I get:
Mean 757,299.01
Standard Error 26,260.22
Median 94,697.00
Mode 792.00
Standard Deviation 3,546,976.60
Sample Variance 12,581,043,007,634.50
Kurtosis 633.33
Skewness 19.95
Range 161,312,492.00
Minimum 736.00
Maximum 161,313,228.00
Sum 13,816,163,106.00
Count 18,244.00
95% CI for the Mean from 705,826.52
to 808,771.50
A mean package size of 757kB with a std-dev of 3MB is probably noteworthy;
the minimum size of 736 bytes compared to a maximum of 161MB is probably
likewise interesting.
> "Proposal A"
> "communicate all torrent information in the Packages file"
> "pieces can no longer be numbered"
The latter isn't actually true -- if you have a Packages file like:
Package: foo
Size: 5341873
SHA1: 38170c08cb458fd4879c34b6f608294c50312bbb
SHA1-pieces:
e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e 1048576
7448d8798a4380162d4b56f9b452e2f6f9e24e7a 1048576
a3db5c13ff90a36963278c6a39e4ee3c22e2a436 1048576
9c6b057a2b9d96a4067a749ee3b3b0158d390cf1 1048576
5d9474c0309b7ca09a182d888f73b37a8fe1362c 1048576
ccf271b7830882da1791852baeca1737fcbe4b90 98993
Package: bar
Size 72856
SHA1: 9425fa8de16f6283365f6bee87f405da16a203e6
then you have 7 pieces all up, five of size 1048576, one of size 98993
and one of size 72856, and you can number them in order, ie:
0 -> foo[0]
1 -> foo[1]
2 -> foo[2]
3 -> foo[3]
4 -> foo[4]
5 -> foo[5]
6 -> bar[0]
You're depending on your Packages file being in the same order on
different hosts, but that's more or less ok anyway. The major thing
that changes in that scenario is that _all_ the pieces can be "short",
rather than just the last.
> "Cons"
> "...difficult to find rare pieces"
A simpler approach might be to communicate "I'm planning on downloading
the entire torrent" or "I have downloaded the entire torrent", and
prioritise those peers. We have a bunch of well-connected mirrors around
already and I wouldn't expect that to change, so there's no reason not
to make use of it. And we have lots of people who have a full mirror
for their architecture(s) too who would participate in a p2p scheme,
so if you had a bit to flag those hosts, you'd probably be pretty okay.
Another approach is to have the existing mirror network act as a
backchannel, so that if you can't download foo.deb from any peers in
reasonable time, you grab it from a regular http mirror instead.
> "how do peers communicate BITFIELD information of all the pieces they
> have when the pieces are no longer numbered"
Probably by sending a lot of "HAVE ..." notices?
Cheers,
aj
-----BEGIN PGP SIGNATURE-----
iD8DBQFGEMV2Oxe8dCpOPqoRAuD0AKCo4/2VeYGD2L68A2RuyeteyiRvWgCeIIy8
qndEMf7g91yL7axwW4c71I0=
=Fu6f
-----END PGP SIGNATURE-----
More information about the Debtorrent-devel
mailing list