[gopher] Gopher++ scrapped & Internet Archive -style thingy
Mike Hebel
nimitz at nimitzbrood.com
Tue Apr 20 12:40:23 UTC 2010
On Apr 20, 2010, at 4:25 AM 4/20/10, Kim Holviala wrote:
> As part of my project to code a neat search engine to cover the
> whole Gopherspace I've (partially) crawled sites and snooped and
> researched a lot of stuff.
>
> Let's just say that the Gopherspace is small, but interesting. I'm
> glad I started crawling :-).
>
> Anyway.
>
> Whatever I've written about the gopher++ extra headers can now be
> considered as "obsolete". I found a few live sites which just cannot
> accept anything else than a selector<CRLF> so there's no way I can
> insert extra headers without breaking stuff. Those sites even break
> with type 7 queries (and gopher+) so I'm kind of giving up now.
>
> All code regarding the header extensions has been scrapped and
> deleted, it's all gone for good. The good thing is that my code is
> now 100% compatible with ALL early 90's servers but the bad thing is
> that the neat charset conversion thingy is now all gone and we're
> back to 7-bit US-ASCII (or non-working Latin/UTF). Oh, well.
I'm confused here. Is this the client side of things or the server
side? If the goal is to keep Gopher moving forward then why not
create a better server with an expanded protocol? And if it's just
your servers that do the gopher++ dance then why does it matter if
other servers don't? Other than crawling the servers don't interact
as far as I can tell. (Unless I'm once again being dense.)
> As my search engines indexer is an offline one my spider basically
> crawls around and saves all type 0&1 files to a local cache
> hierarcy. This was mostly accidental, but I managed to create
> something very much like The Internet Archive but for gopher.
> Basically, you give the cache manager an url and it gives you back
> the cached page (if it has it) AND it mangles menus so that as long
> as the pages are in cache you'll stay in the cache.
>
> It's kind of like a combination of Google's cache and archive.org,
> only it works better than either of those...
>
> Here's a cached copy of (partial) Floodgap:
> gopher://gophernicus.org/1/cache.q?gopher://gopher.floodgap.com
>
> It even cached itself:
> gopher://gophernicus.org/1/cache.q?gopher://gophernicus.org
>
> Notice how the cached Floodgap is much faster than the original
> one ;D. I wish there was something like this for teh web....
>
> <turtleneck shirt mode on>
> One more thing,
> </turtleneck>
>
> I'll be crawling everything in about a month or so, so now is the
> time to fix your robots.txt if you don't want your files to end up
> in the cache.
Very cool. :-)
--
Mike
"All we wanna do is eat your brains! We're not unreasonable, I mean no
one's gonna eat your eyes." - Re: Brains, Jonathan Coultan
More information about the Gopher-Project
mailing list