[Pkg-mediawiki-devel] Preparing for wp-mirror-0.7

wp mirror wpmirrordev at gmail.com
Fri Jan 10 07:43:35 UTC 2014

Dear Ariel,

Happy New Year.  I am gearing up for wp-mirror-0.7.  To that end, I would
like to list some issues that I see; and I would like to offer my help in
solving them.

0) Problem Statements

0.1) Page Rendering.  Wp-mirror-0.6 works well in the sense that it builds
a faithful mirror of any of your wikis.  However, during 2013 the rendering
of pages eroded materially.  For example,

     o interlanguage links have vanished both from rendered pages and from
dump files;
     o infoboxes are no longer rendered;
     o most transclusions now render as redlinks even though the templates
are easily found in the underlying database; etc.

I understand that this erosion occurred because wp-mirror-0.6 still uses
mediawiki-1.19, whereas WMF has moved on to mediawiki-1.23.  For example, I
understand that:

     o interlanguage links have been removed to the wikidata project, the
rendering of which requires mediawiki-1.21+;
     o infoboxes now require the scribunto extension which requires

0.2) Database Schema.  Some differences in database schema have appeared.

     o category - dump files now have 5 fields, whereas the database schema
has 6 fields;
     o exterallinks - dump files now have 4 fields, whereas the database
schema has 3 fields.

Loading these two tables generate the error message:  ``Column count
doesn't match value at row 1.''

0.3) Version Lifecycle.  According to <
http://www.mediawiki.org/wiki/Version_lifecycle> mediawiki 1.23 LTS is
slated for May 2014.  However, the Debian packaging team is silent as to
their plans for a transition from mediawiki-1.19 LTS to mediawiki-1.23 LTS.

0.4) Image Dumps.  The large image dump tarballs are now a year old.  This
means that, while wp-mirror still downloads the bulk of its images from
these tarballs, there are a growing number that must be downloaded
individually from WMF.

0.5) Thumbs.  One person has asked me if dump files of thumbs could be made
available. We are beginning to see thumb dumps from the xowa project.

0.6) IPv6.  I am glad to see that <gerrit.wikimedia.org> has an IPv6
address.  However, <bastion.wmflabs.org> still does not.  My internal
network is IPv6 only.

1) mwxml2sql

This utility from Ariel Glenn has proved invaluable to the wp-mirror
project. This utility, together with MySQL 5.5 fast index creation, allows
wp-mirror to build mirrors much faster than before (80% less time).

1.1) Need for update.  According to its version information, mwxml2sql may
only be valid through mediawiki-1.21.

(shell)$ mwxml2sql --version
mwxml2sql 0.0.2
Supported input schema versions: 0.4 through 0.8.
Supported output MediaWiki versions: 1.5 through 1.21.

Whereas, I am looking forward to mediawiki-1.23 LTS (see below), I would
like to know if mwxml2sql should be updated.

1.2) Help Offer.  If mwxml2sql does need updating, I would be happy to help
with this; and to package it for Debian as I have done before. Perhaps we
could call it mwxml2sql-0.0.3.

2) mediawiki-1.23 LTS.

2.1) Vision. I would like wp-mirror-0.7 to be able to build a mirror that
serves pages that look no different than those served by WMF.

2.2) DEB package.  To that end, I am thinking of packaging mediawiki-1.23
together with the extensions needed for rendering WMF wikis with wikidata
content, infoboxes, math, transclusions, etc.   Given WMF's ``continuous
integration'' development model, I would like to be able to automatically
generate a tarball and DEB package each time WMF pushes an update to its

2.3) Debian package repository.  Such a DEB package would be distributed
with wp-mirror. In preparation for this, I have set up a Debian package
repository at <http://download.savannah.gnu.org/releases/wp-mirror/>.  It
is currently used to distribute wp-mirror-0.6 and an unstable version of
wp-mirror-0.7.  Home page <http://www.nongnu.org/wp-mirror/>.

2.4) Help Offer.  I am happy to do most of this work myself.  However, I
will need some guidance on interacting with the appropriate GIT
repositories.  I hope that you can put me in touch with someone involved in
the ``continuous integration'' process.

3) Media dumps

I am thinking that updating the image dumps annually would be adequate.
 Including thumbs in those dumps would materially assist the off-line
community.  I could easily update wp-mirror-0.7 to give the user a choice
(no media files, thumbs only, full size media files).

Sincerely Yours,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-mediawiki-devel/attachments/20140110/33c177d8/attachment.html>

More information about the Pkg-mediawiki-devel mailing list