[gopher] gopher++ (gopher1) protocol

Kim Holviala kim at holviala.com
Sun Jan 10 22:46:26 UTC 2010


I think I promised to write down my thoughts about the gopher++ 
extensions to the original gopher0 protocol. I was going to implement it 
first, make sure it works and only then document everything and send the 
explanations to this list.... but I'm in a middle of major rewrite to 
kgopherd so the implementation has to wait for a week or so. Besides, 
even if I got kgopherd up and running with gopher++, I'd still be 
missing a client (which I'll write, eventually).

So here goes, these are just mostly untested thoughts (that I WILL try 
out as soon as possible). This is written in somewhat rfc-like offical 
format... but only somewhat.

===============================

Gopeher++ protocol
Kim Holviala <kim at holviala.com>


== A primer to the original gopher0 ==

The original gopher0 protocol from rfc1436 is as follows (C for client 
traffic, S for server replies):

C: <opens the connection>
C: /path/to/resource
S: <dumps the resource to the client>
S: <closes the connection>

The "resource" above can be either a file, or a specially formatted 
gopher menu (see rfc1436). Menus are just simple text files which 
describe the file resources that can be downloaded. Menus also tell the 
client what type of a file the resource is (text, image etc).

There are a few major problems with that simplistic approach, mainly:

1) the server doesn't know if the client wants a file, or a menu
2) the client doesn't know if it's getting a file, or a menu
3) the client has no knowledge of the charset of the file/menu
4) the client doesn't know what type of a file it's getting

Problems 1 & 2 don't seem that important at first, but consider this: 
the server has a menu and a file. Both are (accidentally) removed from 
the server but the clients keep requesting them. What kind of an error 
page will the server generate? A menu, or just plain text? The server 
cannot know since it doesn't know what the client expects.

Problem number 3 is a common one across all protocols. Server uses 
charset A while the client wants charset B. That's all fine, except that 
gopher0 cannot transfer the encoding information between the server and 
the client.

Problem number 4 is an interesting one. If the resource (file) is tagged 
by the server as type g, the client can be fairly certain that it's 
getting a GIF image. Except that when the image is removed and the 
server sends the error message in menu format. If resource is tagged as 
I (generic image), the server can send out pretty much anything and the 
client has no idea what it's getting.


== How HTTP solves these problems ==

Gophers rival HTTP has solved these problems, in a way. In HTTP the 
client asks for a resource, and the server gives it back a description 
of the data, and the actual data. Then it's the clients responsibility 
to figure out if the data it got back has anything to do with the data 
it requested.

This is really simple for the server as it can dump pretty much anything 
to the client as long as it's documented (with Content-Type et all). But 
it's a pain to the client as it needs to understand every file format 
and charset in the world (as it has no idea what it's getting). Hence 
the size of modern web browsers.


== gopher++ (gopher1) protocol ==

To solve these problems with gopher I'm suggesting the following 
extensions which I call gopher++ (or gopher1).

A gopher++ transaction goes like this (again, C for client and S for 
server):

1 C: <opens the connection to gopher.holviala.com>
2 C: /path/to/resource
3 C: Host: gopher.holviala.com
4 C: Accept-Charset: UTF-8
5 C: Accept: text/plain
6 C: Referer: gopher://gopher.holviala.com/t/path/to/menu
7 C: User-Agent: gopher++/0.1
8 S: <dumps the resource to the client>
9 S: <closes the connection>

Lines 1 & 2 are identical to the original gopher0 protocol, and so are 
lines 8 & 9. This makes gopher++ 100% backwards- and forwards-compliant 
with gopher0. A gopher0 server never reads the additional headers the 
gopher++ client sends. A gopher0 client connecting to gopher++ server 
gets back the resource just as it would get it back from an older 
gopher0 server.

If a gopher++ client is talking to a gopher++ server then the extra 
headers come into effect.

The Host: header tells the server the original hostname the client was 
connecting to. This header allows the server to serve multiple hostnames 
under one IP address (virtual hosting). A gopher++ client MUST always 
send the Host header.

The Accept-Charset: header tells the server what charset the client can 
handle. If a client sends the Accept-Charset header, the server MUST 
send its reply using the charset speficied. If the resource has no 
meaning of charset, the server can ignore this header. If the client 
does not send the Accept-Charset header, or if the server doesn't 
recognize the charset the client requested, the server MUST serve the 
resource using the 7bit US-ASCII charset (if applicable).

The Accept: header tells the server which type of content the client 
expects. If the client sends the Accept header to the server, the server 
MUST send its reply using the format the client requested. If the client 
does not send the Accept header, if the server doesn't recognize the 
content-type client requested, or if the client requests a content-type 
of "application/octet-stream", the server must serve the resource in its 
original format, or format it thinks the client expects.

The optional Referer: header tells the server from which URL the client 
came from. This header is purely for server's benefit and the client can 
refuse to send it for privacy reasons.

The User-Agent: header contains the client application name (and 
possibly version). Clients SHOULD send this header as it helps servers 
track down misbehaving clients.


== Accept-headers and transcoding ==

The Accept-Charset and Accept-headers in gopher++ require some more 
explanation. As said above, if those headers exist, the server MUST obey 
them. As the client can not know the kind of data it's getting back from 
the server, it must rely on the server to send exactly what was being 
requested.

For Accept-Charset, if the server does not have the resource in the 
correct charset the server MUST transcode the textual information to the 
charset the client requested (if applicable). This moves the burden of 
charset conversions from the client to the server. In gopher++ the 
client never has to transcode textual information from one charset to 
another.

For the Accept-header, if the server does not have the resource in the 
correct content-type, if at all possible the server MUST transcode the 
content to fit client requirements. This requirement makes clients small 
and fast as they do not have to carry support for all possible resource 
formats, nor do clients have to be recoded to recognize completely new 
formats.

The server MUST be able to offer all plain text information (text files 
and gopher menus) in US-ASCII, Latin-1 and UTF-8 charsets. A client 
SHOULD not request for anything else than the same three charsets.

The server MUST be able to convert all image resources to GIF, PNG and 
JPEG formats. A client SHOULD not request for any other format than 
those three.

The server SHOULD be able to convert PDF and PostScript resources to the 
any of the above three image formats and to plain text. A client SHOULD 
not ask for anything else.

The server SHOULD be able to convert all audio resources to either WAV, 
MP3 or OGG Vorbis. A client SHOULD not ask for any other format.

The server SHOULD be able to convert all video resources to either MPEG 
or OGG Theora. As video transcoding is CPU-intensive and video formats 
are a moving target, the server is not obligated to obey client requests 
for video formats. A client SHOULD not ask for anything else than MPEG 
or OGG Theora, or "application/octet-stream" if it wants the original 
video stream.


== gopher0 filetypes and request content-types ==

A table of old gopher0 filetypes and their matching gopher++ mimetypes. 
The video filetype is "v" instead of the commonly used ";".

For example, if a gopher0 menu specifies that a resource is of type "0", 
a client SHOULD not ask for anything else than application/gopher-menu. 
If the resource is of type "p", the server must be prepared to convert 
the pdf file to an static image or plain text. In all cases the client 
can ask for "application/octet-stream" in which case the server sends 
the resource as is.

gopher0  content-types
=======  =============
   0      text/plain
   1      application/gopher-menu
   7      application/gopher-menu
   9      application/octet-stream
   g      image/gif, image/png, image/jpeg
   h      text/html, text/plain
   I      image/gif, image/png, image/jpeg
   p      application/pdf, image/*, text/plain
   s      audio/wav, audio/mpeg, audio/ogg
   v      video/mpeg, video/ogg


== Examples ==

These examples lack the optional Referer: and User-Agent: headers for 
clarity.

Client requests the root menu:
C: <opens the connection>
C:
C: Host: foo.bar
C: Accept-Charset: UTF-8
C: Accept: application/gopher-menu
S: <sends the menu in UTF-8>
S: <closes the connection>

Client uses an external PDF reader:
C: <opens the connection>
C: /doc/document.pdf
C: Host: foo.bar
C: Accept-Charset: US-ASCII
C: Accept: application/octet-stream
S: <dumps the original pdf, will NOT transcode to US-ASCII>
S: <closes the connection>

Clients wants to show the PDF in it's own window as text:
C: <opens the connection>
C: /doc/document.pdf
C: Host: foo.bar
C: Accept-Charset: US-ASCII
C: Accept: text/plain
S: <converts the pdf to US-ASCII text and dumps it>
S: <closes the connection>

Client doesn't know how to handle jpeg images:
C: <opens the connection>
C: /images/image.jpeg
C: Host: foo.bar
C: Accept: image/gif
S: <converts the jpeg to gif and dumps the result to the client>
S: <closes the connection>

Client is just being stupid:
C: <opens the connection>
C: /doc/rfc1436.txt
C: Host: foo.bar
C: Accept-Charset: Latin-1
C: Accept: image/png
S: <dumps the original rfc and ignores the image conversion request>
S: <closes the connection>

Client is for deaf people:
C: <opens the connection>
C: /doc/rfc1436.txt
C: Host: foo.bar
C: Accept-Charset: Latin-1
C: Accept: audio/mpeg
S: <may convert the document to audio and then dump the mp3>
S: <closes the connection>

Client is requesting a resource that doesn't exist:
C: <opens the connection>
C: /doc/
C: Host: foo.bar
C: Accept-Charset: UTF-8
C: Accept: application/gopher-menu
S: <sends an error message as gopher0 menu>
S: <closes the connection>

Client is requesting an image that doesn't exist:
C: <opens the connection>
C: /images/missing.jpeg
C: Host: foo.bar
C: Accept: application/octet-stream
S: <either sends the error as an jpeg image, or sends nothing>
S: <closes the connection>










More information about the Gopher-Project mailing list