[gopher] GopherMole - a gopher media crawler

Mateusz Viste mateusz at viste.fr
Sat Jan 3 11:13:10 UTC 2015


On 01/03/2015 11:46 AM, James Mills wrote:
> Or does a Client query a Gopherd for CAPS
> and if it sees "Encoding: utf-8" assumes *all*
> content it receives from *that* Gopherd is
> encoded in UTF-8?

That's what I was suggesting, yes.

One could argue that a single server might contain a plethora of 
documents, each of which would be encoded in a specific charset, and 
that's certainly a possibility. But in practice, I have always seen 
servers saying (in human language, mainly on their root page) "this 
server is serving content in utf-8", and rarely or never "this specific 
document is encoded in xyz".

But still, the CAPS capability I was suggesting was about a "default" 
encoding, that is, "if not specified otherwise, assume everything on 
this server is encoded in this encoding". That way, if one day there is 
a mechanism that allows to specify the charset on a per-document basis, 
both won't collide (although I doubt such specific mechanism will 
appear, but of course one can never be sure of the future).

Currently, gopher clients are supposed to assume ISO Latin 1, as per RFC 
1436. The ServerDefaultCharset CAPS setting I was suggesting in my 
message from 31st of December, 2014, was simply a way to overload that 
RFC charset.

Mateusz





> On Sat, Jan 3, 2015 at 8:38 PM, Mateusz Viste <mateusz at viste.fr
> <mailto:mateusz at viste.fr>> wrote:
>
>     On 01/03/2015 11:27 AM, James Mills wrote:
>
>         Mis-rendered correct (which is what I meant)
>         but the client "won't break".
>
>
>     That's correct.
>
>         What's what I meant by "degrade".
>
>
>     Sure, but that's hardly 'graceful'. And doesn't have anything to do
>     with ISO-8859-1. Which doesn't mean I am opposed to UTF-8 usage in
>     the gopherspace, on the contrary, I'm 100% for it. But it's
>     important to keep in mind the exact impact it will have on legacy
>     clients.
>
>         *I think* a Gopher server that splits out UTF_8 encoded data to
>         a Client
>         that doesn't support UTF-8 encoding will still display the
>         content (just
>         not any codepoint higher than 255)?
>
>
>     Only low-ascii will be rendered correctly, that is anything above
>     code point 127 will be scrambled.
>
>     Here's an example:
>
>     gopher://gopher.viste.fr/0/__docs/other/Little%2520Big%__2520Adventure%2520-%__2520Soluce%2520du%2520jeu%__2520%2528french%2529.txt
>     <http://gopher.viste.fr/0/docs/other/Little%2520Big%2520Adventure%2520-%2520Soluce%2520du%2520jeu%2520%2528french%2529.txt>
>
>     Same thing here (but on a polish document):
>
>     gopher://gopher.viste.fr/0/__docs/opowiadania%2520%__2528polish%2529/sendbajt.txt
>     <http://gopher.viste.fr/0/docs/opowiadania%2520%2528polish%2529/sendbajt.txt>
>
>     When I open these documents with Overbite, all french or polish
>     diacritics are broken (until I set my browser manually to UTF-8).
>
>     Of course there are thousands of such examples across the gopherspace.
>
>     Mateusz
>
>
>     _________________________________________________
>     Gopher-Project mailing list
>     Gopher-Project at lists.alioth.__debian.org
>     <mailto:Gopher-Project at lists.alioth.debian.org>
>     http://lists.alioth.debian.__org/cgi-bin/mailman/listinfo/__gopher-project
>     <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/gopher-project>
>
>
>
>
> _______________________________________________
> Gopher-Project mailing list
> Gopher-Project at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/gopher-project
>



More information about the Gopher-Project mailing list