[gopher] Spidering teh gopherspace
Kim Holviala
kim at holviala.com
Tue Apr 13 14:40:08 UTC 2010
Yeah, me again.
Since I got the server done I promised to start writing clients. But
instead, I got sidetracked and decided to code a search engine first...
I got a basic spider up and running (pure POSIX C again) in a couple of
hours and succesfully tested it against my own server. Which then first
throttled me (1-second delay before reply), and after I cleared the
sessions inetd decided that it had enough and kicked me totally out.
So, being slightly viser I inserted some delays to the spidering engine
which would prevent the killing of the server being spidered.
And before anyone asks, yes, I'll make it support robots.txt.
Anyway, on to a few questions: What kind of spidering rate would the
admins here accept? The spider will index types 0 and 1 (text documents
and menus) and currently does three hits per second (actually, a hit and
a 1/3 second delay). I think that's too fast - so how does a hit per
second sound like? I'll take forever to spider things, but at least it
wouldn't kill anyones server....
I'm also thinking about bandwidth limiting, but I need to see if that's
possible (being on the receiving end).
- Kim
More information about the Gopher-Project
mailing list