[gopher] Spidering teh gopherspace

Kim Holviala kim at holviala.com
Wed Apr 14 18:48:58 UTC 2010


Wow. That option actually exists...

User-agent: Yahoo-Blogs
Crawl-delay: 20
Disallow: /tmp

I'll implement that asap.

[5 minutes later]

Done.


- Kim



On 2010-04-14 17:10, Alex Nordlund wrote:
> Suggestion:
>
> Let the user decide in robots.txt the lowest acceptable delay.
>
> (but make sure to have a good default delay)
>
> ---
> //Alex
> "Look boys, the graphical part is done. Now we just have to code it!" --tdist
>
>
>
> On Tue, Apr 13, 2010 at 4:40 PM, Kim Holviala<kim at holviala.com>  wrote:
>> Yeah, me again.
>>
>> Since I got the server done I promised to start writing clients. But
>> instead, I got sidetracked and decided to code a search engine first...
>>
>> I got a basic spider up and running (pure POSIX C again) in a couple of
>> hours and succesfully tested it against my own server. Which then first
>> throttled me (1-second delay before reply), and after I cleared the sessions
>> inetd decided that it had enough and kicked me totally out.
>>
>> So, being slightly viser I inserted some delays to the spidering engine
>> which would prevent the killing of the server being spidered.
>>
>> And before anyone asks, yes, I'll make it support robots.txt.
>>
>> Anyway, on to a few questions: What kind of spidering rate would the admins
>> here accept? The spider will index types 0 and 1 (text documents and menus)
>> and currently does three hits per second (actually, a hit and a 1/3 second
>> delay). I think that's too fast - so how does a hit per second sound like?
>> I'll take forever to spider things, but at least it wouldn't kill anyones
>> server....
>>
>> I'm also thinking about bandwidth limiting, but I need to see if that's
>> possible (being on the receiving end).
>>
>>
>> - Kim
>>
>>
>> _______________________________________________
>> Gopher-Project mailing list
>> Gopher-Project at lists.alioth.debian.org
>> http://lists.alioth.debian.org/mailman/listinfo/gopher-project
>>
>
> _______________________________________________
> Gopher-Project mailing list
> Gopher-Project at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/gopher-project




More information about the Gopher-Project mailing list