[gopher] Gopher++ scrapped & Internet Archive -style thingy

Tue Apr 20 12:40:23 UTC 2010

On Apr 20, 2010, at 4:25 AM 4/20/10, Kim Holviala wrote:

> As part of my project to code a neat search engine to cover the  
> whole Gopherspace I've (partially) crawled sites and snooped and  
> researched a lot of stuff.
>
> Let's just say that the Gopherspace is small, but interesting. I'm  
> glad I started crawling :-).
>
> Anyway.
>
> Whatever I've written about the gopher++ extra headers can now be  
> considered as "obsolete". I found a few live sites which just cannot  
> accept anything else than a selector<CRLF> so there's no way I can  
> insert extra headers without breaking stuff. Those sites even break  
> with type 7 queries (and gopher+) so I'm kind of giving up now.
>
> All code regarding the header extensions has been scrapped and  
> deleted, it's all gone for good. The good thing is that my code is  
> now 100% compatible with ALL early 90's servers but the bad thing is  
> that the neat charset conversion thingy is now all gone and we're  
> back to 7-bit US-ASCII (or non-working Latin/UTF). Oh, well.

I'm confused here.  Is this the client side of things or the server  
side?  If the goal is to keep Gopher moving forward then why not  
create a better server with an expanded protocol?  And if it's just  
your servers that do the gopher++ dance then why does it matter if  
other servers don't?  Other than crawling the servers don't interact  
as far as I can tell.  (Unless I'm once again being dense.)

> As my search engines indexer is an offline one my spider basically  
> crawls around and saves all type 0&1 files to a local cache  
> hierarcy. This was mostly accidental, but I managed to create  
> something very much like The Internet Archive but for gopher.  
> Basically, you give the cache manager an url and it gives you back  
> the cached page (if it has it) AND it mangles menus so that as long  
> as the pages are in cache you'll stay in the cache.
>
> It's kind of like a combination of Google's cache and archive.org,  
> only it works better than either of those...
>
> Here's a cached copy of (partial) Floodgap:
> gopher://gophernicus.org/1/cache.q?gopher://gopher.floodgap.com
>
> It even cached itself:
> gopher://gophernicus.org/1/cache.q?gopher://gophernicus.org
>
> Notice how the cached Floodgap is much faster than the original  
> one ;D. I wish there was something like this for teh web....
>
> <turtleneck shirt mode on>
> One more thing,
> </turtleneck>
>
> I'll be crawling everything in about a month or so, so now is the  
> time to fix your robots.txt if you don't want your files to end up  
> in the cache.

Very cool. :-)

--
Mike

"All we wanna do is eat your brains! We're not unreasonable, I mean no  
one's gonna eat your eyes." - Re: Brains, Jonathan Coultan