debian source mirror as git repos

Zenaan Harkness zen at freedbms.net
Fri May 22 04:00:58 UTC 2015


(take 2, with corrected To address)

I use debmirror to mirror a few dist/arch combinations which I find
extraordinarily useful quite regularly.

A few times a year I also update debian "source" packages into my
mirror pool - when I do a full, source-included debmirror, is the only
time I "cleanup" the repo of old files. Today is one such time for
including sources. It's not fast on this rural connection.

I notice iceweasel source packages are numerous (eg it's various
translations) and apparently somewhat voluminous. It would seem that a
git repo of iceweasel would mean updating the Debian sources for
Iceweasel would be much less bandwidth consuming (which would be a
good thing).

In thinking about the nature of Debian and its binary packages to
source packages versioning and distribution, could the distribution of
source updates to the Debian mirror network (e.g. for Debian local
mirror junkies like me as well as for those tracking individual
packages) be done with say daily git update packs/bundles, using
something like git send-pack? (Of course, only if there's actually an
update - such updates are not daily, just "not more frequent than"
daily - update frequency might be maintainer driven, or might be ftp
masters driven.)

It may be that the resultant reduction in (full) source file mirror
network load would provide for additional units other than "at most
daily" - e.g. weekly and/ or monthly update packs, for those who
mirror update less frequently - although this is perhaps entirely
unnecessary with git-receive-pack just being fed as many daily packs
as there are.

A question/contention that may arise is how to produce deterministic
"update packs" so that mirrors all match - well, the "pack
definitions", if they need to be regenerated, must of course come from
a canonical location - the developer or some debian server, or best to
just define some deterministic algorithm (assuming this is possible)
e.g.:
- a debian git (or scm) update pack is a,
- time period (daily or monthly) bzipped tar file,
- of all commits from the "begin moment" of the time period inclusive,
- to the "begin moment" of the subsequent time period exclusive,
- stored sequentially in the Debian scm update pack file

Primary object types in this new Debian "source mirror":
- git (or other scm) --mirror (or "--bare" style) repos
- Debian git (or other scm) update packs

git handles duplicate objects in the case that git-fast-import is
called more than once on the same update pack, and in the case that a
manual git fetch has occurred in the meantime.

Part of the nature of these "source mirror" repos is that the local
mirror maintainer can just as well add additional "remotes" to their
"debian source mirror repo" - this new Debian source distribution
concept simply ensures that a repo designated to store the source for
a particular Debian package, simply has all the appropriate branches,
tags, objects etc. that are needed to build the correspodning binary
package(s). This might imply certain namespace conventions which the
"debian repo mirror update utility" would assume it can do whatever it
wants with, e.g. all tag names beginning with "debian-", all branch
names beginning "debian-", all 'remote' names starting with "debian-",
and so on.

I'm guessing this has been well and truly thought about by others
actually competent in this area and perhaps it's already almost here -
apologies if this email is too much noise.

Regards,
Zenaan



More information about the D-community-offtopic mailing list