[gopher] Spidering teh gopherspace

Kim Holviala kim at holviala.com
Tue Apr 13 14:40:08 UTC 2010


Yeah, me again.

Since I got the server done I promised to start writing clients. But 
instead, I got sidetracked and decided to code a search engine first...

I got a basic spider up and running (pure POSIX C again) in a couple of 
hours and succesfully tested it against my own server. Which then first 
throttled me (1-second delay before reply), and after I cleared the 
sessions inetd decided that it had enough and kicked me totally out.

So, being slightly viser I inserted some delays to the spidering engine 
which would prevent the killing of the server being spidered.

And before anyone asks, yes, I'll make it support robots.txt.

Anyway, on to a few questions: What kind of spidering rate would the 
admins here accept? The spider will index types 0 and 1 (text documents 
and menus) and currently does three hits per second (actually, a hit and 
a 1/3 second delay). I think that's too fast - so how does a hit per 
second sound like? I'll take forever to spider things, but at least it 
wouldn't kill anyones server....

I'm also thinking about bandwidth limiting, but I need to see if that's 
possible (being on the receiving end).


- Kim




More information about the Gopher-Project mailing list