[gopher] The Gopher Archive

Kim Holviala kim at holviala.com
Fri Apr 23 11:21:23 UTC 2010


On 22.4.2010 23:07, Brian Koontz wrote:

>> I'm archiving Teh Gopher. All of it - well all textual searchable
>> information, not binaries nor images...
>
> The next logical step would be to set up a mechanism to mirror the
> archive, because we all know what happens when one large repository
> suddenly goes down for what will likely be forever (hal3000.cx,
> anyone)?

Replace $ROOT with whatever directory you want to keep the files in.

$ rsync rsync.gophernicus.org::archive/
drwxr-xr-x        4096 2010/04/19 15:51:09 .
drwxr-xr-x        4096 2010/04/23 01:06:42 sites

$ rsync -avz --progress rsync.gophernicus.org::archive/ $ROOT/
receiving incremental file list
created directory  $ROOT
./
sites/
sites/last
           29 100%   28.32kB/s    0:00:00 (xfer#1, to-check=1066/1069)
sites/1/
sites/1/155.198.1.33:70/
sites/3/
sites/3/gopher.386server.info:70/
[...]

Archive directory structure is pretty simple: all of the sites are 
under, uh, sites/ (more directories are coming under there) and they are 
grouped by the first letter of the primary domain name.

So for example gopher.floodgap.com's port 70 can be found from 
$ROOT/sites/f/gopher.floodgap.com:70/

Under the site directory there are one or more subdirectories, the 
archived files are under the cache/ directory. Under there you have 
one-letter directories which present the first letter of the md5 sum of 
the original selector. The actual downloaded files are saved with the 
selector-md5summed filename and have some mime headers, dual CRLF's and 
a bit-perfect unmodified copy of the original file.

Uh, complicated.

Let's take this file from floodgap:
/archive/walnut-creek-cd-simtel/BEEHIVE/MYZ80/00-INDEX.TXT

$ printf "/archive/walnut-creek-cd-simtel/BEEHIVE/MYZ80/00-INDEX.TXT" | 
md5sum

e9c26adf54530a785378971bbac7cd23  -

$ ls -la 
$ROOT/sites/f/gopher.floodgap.com\:70/cache/e/e9c26adf54530a785378971bbac7cd23

-rw-r--r-- 1 kimmy users 1334 2010-04-23 14:12 
$ROOT/f/gopher.floodgap.com:70/cache/e/e9c26adf54530a785378971bbac7cd23

$ head -20 
$ROOT/sites/f/gopher.floodgap.com:70/cache/e/e9c26adf54530a785378971bbac7cd23

Location: 
gopher://gopher.floodgap.com:70/0/archive/walnut-creek-cd-simtel/BEEHIVE/MYZ80/00-INDEX.TXT
Host: gopher.floodgap.com:70
Filetype: 0
Selector: /archive/walnut-creek-cd-simtel/BEEHIVE/MYZ80/00-INDEX.TXT
Referer: 
gopher://gopher.floodgap.com:70/1/archive/walnut-creek-cd-simtel/BEEHIVE/MYZ80
Name: 00-INDEX.TXT
Title: /archive/walnut-creek-cd-simtel/BEEHIVE/MYZ80/00-INDEX.TXT
Date: 2010-Apr-23 11:12
Timestamp: 1272021150
Size: 892

MYZ80111.ZIP   105339  05-22-93  V1.11 Of Simeon Cran's CP/M emulator 
for the
                                | PC. This is Simeon Crans' complete CP/M
                                | package for the PC. It needs a 286 (or
                                | better) to run and is packed with goodies,
                                | such as the ability to run CP/M 2.2 or 
3.0,
                                | 32-bit processor aware, multitasker aware,
                                | ADM3A/Televideo emulation, complete key
                                | re-mapping, etc etc. You've tried the 
rest,
                                | now try the BEST!! I haven't seen a better




- Kim



More information about the Gopher-Project mailing list