[Debtorrent-devel] Fwd: BitTorrent Protocol Expansion (Google SoC)

Cameron Dale camrdale at gmail.com
Fri Apr 13 01:50:38 UTC 2007


---------- Forwarded message ----------
From: Anthony Towns <aj at azure.humbug.org.au>
Date: Apr 8, 2007 12:42 AM
Subject: Re: BitTorrent Protocol Expansion (Google SoC)
To: Cameron Dale <camrdale at gmail.com>


On Sat, Apr 07, 2007 at 06:58:09PM -0700, Cameron Dale wrote:
> It's the only way (that I can see) to solve the problem of "all
> downloaders of a package need to be aware of all other downloaders"
> and therefore get the desired efficiency from the download.

I think we've got a bit of a disconnect here -- "all downloaders of
___ need to be aware of all other downloaders" doesn't sound like what
happens in bittorrent at all to me -- you just get a random selection
of other downloaders and go from there. Feel free to go into academic
p2p lecture mode on that for a paragraph if you like :)

> >I'm not sure 100 days ago is interesting anyway though. Presumably you can
> >expect to mostly have a few different likely frequencies for
> >testing/unstable
> >users:
> >        * obsessive-compulsive updating of every change, ASAP
> >        * once every 12, 24 or 48 hours
> >        * once or twice a week
> >I wouldn't really expect it to be all that interesting to worry much beyond
> >that frequency -- you're not going to have enough peers that low for it to
> >be interesting, afaics.
> I'm quite a bit behind these myself, as I have a testing box I
> probably only update the Packages file of about once a month, and an
> unstable box I update maybe once every 2 weeks. I could be the
> minority though, I'm not sure. Some numbers on this would be nice, but
> I have no idea where we could get them from.

IP analysis of ftp.debian.org logs could work.

I guess my theory's is that if you're going to have a daemon running
constantly sharing files with other people on the net, you're going to
be fairly up to date anyway -- and if you're only up once a month or
whatever, you might as well be sharing the current files then anyway.

> >But even so, at once a week with 12 times a day, you're potentially up
> >to 84 different Packages files anyway (though currently no more than 14).
> As I said above, this makes it even more beneficial to not have a new
> torrent every time, as each torrent will have a smaller number of
> peers in it as the number of torrents increases. Then it could be
> really difficult to find a peer in your torrent with the piece/package
> you're looking for, especially if that piece is not one of the most
> popular ones.

At any rate, sounds like we've got two plausible implementations, that
aren't really all that different, so seems worth analysing tradeoffs, no?

My main concerns are implementational:

        - changing piece sizes from "constant + one small piece at the end"
          to variable is a major change

        - sharing files between torrents seems a worry, but a necessary
          one since we'll want to not double the space people need
          to watch testing and unstable.

        - extending torrents as time passes seems "new" and might be
          difficult to implement; possibly that should be left 'til later

        - there's a real issue if the torrents have the same "file"
          with different contents (an old torrent had a file in the pool,
          which was deleted, then later recreated with new contents
          and included in a new torrent, eg. *should* never happen,
          but not 100% assured)

I think the goal has to be getting something that works (and can
reasonably be made more efficient in future) than getting something that's
as good as possible first. But you know a lot more about bittornado than
I do.

> >             torrent a           torrent b
> >Peer 1           Y
> >Peer 2           Y                  Y
> >Peer 3                              Y
> >So if Peer 1 has a piece that's common across both torrents, Peer 2 will
> >get it via torrent a, then be able to share it with Peer 3 via torrent b.
> Yes, but the chances of that happening are slim, as it needs to be a
> package that Peer 2 wants as well. Add in 84 possible torrents, and
> the chances decrease even more. I don't think it's something we could
> depend on as being possible.

Well, we can always make the initial requirement/expectation be that
everyone mirrors the entire torrent as far as possible. Even if not
everyone does, it'll increase the odds enough to be sustainable, and
for a first version, I think that's fairly reasonable anyway.

> >Treating the path in the pool
> >(pool/main/g/gamin/libgamin0_0.1.7-4_powerpc.deb)
> >as unique-per-file should be fine for that in almost all cases, fwiw.
> If you mean communicating the path as the unique piece identifier,
> then this is the same as using the SHA1 hash of the piece as the piece
> number, instead of using some kind of sequential piece numbering.

Oh, that's not actually sufficient btw -- we can easily have two files
in the pool with the same SHA1. This'll particularly happen if two source
packages use the same upstream (eg, contrib/f/foo/foo.orig.tar.gz becomes
main/f/foo/foo.orig.tar.gz when one of its dependencies becomes free,
or a source package gets renamed within a component without its upstream
changing). You can handle that adequately on the client side of course,
without worrying about it in the protocol.

I more meant for ease of discussion, though.

> >sid and experimental don't have a defined endpoint; I'm not sure what
> >you'd want to do about them. I'm not sure what (if anything) you'd do
> >when a new suite (like lenny) gets introduced either.
> I'm not sure what to do with sid and experimental either.

experimental's small enough you can just ignore it.

> >Yup, we got word a couple of hours ago, only nine slots. :(
> That's low. I was definitely expecting more. Any idea you can give me
> on this project's chances?

So far it seems fine, no negative comments, a variety of support;
currently #4, but I think we'll have to drop one of #2/#3. None of that's
really meaningful until we do a final review though.

> >Daniel Burrows <dburrows at debian.org> and Michael Vogt <mvo at debian.org>
> >have both offered to help (particularly with apt-acquire code if
> >necessary) in the private comments on your app btw.
> Great! I see they're strong in the apt department, which will be
> useful as it is my weakest point in this project.

Yup.

> I'm having some trouble judging the tone of your emails sometimes. We
> seem to be going back and forth a lot on the same issue, and I'm not
> sure if it's because you really dislike my proposal, don't understand
> it, or are just trying to generate discussion (maybe as a test?).

(a) I think we're covering a bunch of the issues that'll end up being
    important, including how we "name" pieces, and what we expect peers
    to actually be doing

(b) Dealing with the ways the archive changes is important and difficult,
    so seems worth discussing up front

(c) I like discussing the concepts heavily up front prior to implementation,
    consider it a character flaw and don't think it's a reason to stop from
    diving in to implementation, particularly if it can be changed later :)

(d) It's not my pet implementation, of course I dislike it :)

I'm presuming that since it's _your_ pet implementation, you're more
than happy to keep defending it :)

BTW,

] create a torrent for every combination of suite
] (stable/testing/unstable) and architecture, including separate ones for
] architecture:all and source

If that's going to happen, it seems to me like the way to do it is to
add a feel to the top level Release file (dists/testing/Release etc)
like "Torrent-Prefix: xyzzy" and have the torrent be identified using
that string, the component, (main, contrib, etc), and the architecture;
all sha1'ed or whatever as appropriate. That makes it fairly easy to
choose when to reset the torrent, and also lets you share a torrent if
you like (ie, testing and unstable could both use the same prefix).

You also need /some/ way to identify pieces, which is presumably going to
be a long string (SHA1 of contents, name from pool etc) or an "arbitrary"
piece number that's going to have to be kept somewhere and distributed
as part of the Packages file. The latter is something that would have to
be stored in dak (the archive management scripts/database) and added to
the Packages files through apt-ftparchive somehow. I'm not sure that'll
be easy, so I'd be really cautious about letting it be a showstopper
for GSoC.

Hrm.

Another thought: having a deliberate beta with one torrent per Packages
file with the explicit assumption that it'll be a lot less than optimal
would let us get some real measurements of what people actually do,
just by monitoring the tracker.

Cheers,
aj


-----BEGIN PGP SIGNATURE-----

iD8DBQFGGJzaOxe8dCpOPqoRAiSvAKCneMVlNxDpPzwcFP1ilYHUWIyKrQCfYHRq
kv4zLhL8IPSQ51hJbll2exk=
=5gbp
-----END PGP SIGNATURE-----



More information about the Debtorrent-devel mailing list