Sunday, March 6, 2011

Interesting bits from HTTP/1.1 RFC

(Originally under title "Is your server HTTP/1.1 compliant ?" - but I realised that it's not really a relevant one)

Today after watching this excellent wireshark kung-fu video with Hansang Bae, I decided to comb through the HTTP/1.1 spec and see what other interesting bits I can fish from there that are less frequent/interesting to play with or are othewise noteworthy.
Here they go for your entertainment.

To allow for transition to absoluteURIs in all requests in future
versions of HTTP, all HTTP/1.1 servers MUST accept the absoluteURI
form in requests, even though HTTP/1.1 clients will only generate
them in requests to proxies.

In layman terms: your compliant server must understand not only the classic "GET / HTTP/1.1", but also "GET http://www.yourhost.com/ HTTP/1.1"). In case the clients upgrade. All but one of the servers that I did a quick test with, haven't seen this part of the spec. Or optimized it out.

An origin server that does not allow resources to differ by the
requested host MAY ignore the Host header field value when
determining the resource identified by an HTTP/1.1 request. (But see
section 19.6.1.1 for other requirements on Host support in HTTP/1.1.)

Immediately after that follows a big blurb how the server is supposed to derive the host name from the absolute URI. I.e. the one that almost no-one seems to support. So the implementations deliberately ignore the spec. Or are not attentive in reading it ?
The in-progress work from HTTPBis workgroup in IETF also specifies the absolute URIs. So, some house cleaning will be in order.

A very interesting bit about the pipelining:

Clients which assume persistent connections and pipeline immediately
after connection establishment SHOULD be prepared to retry their
connection if the first pipelined attempt fails. If a client does
such a retry, it MUST NOT pipeline before it knows the connection is
persistent. Clients MUST also be prepared to resend their requests if
the server closes the connection before sending all of the
corresponding responses.


This brings 'must be prepared to act robust' general statement makes me think of all sorts of interesting failure modes (yes, and indeed I've seen some of those in real life) - however, this 'retry' also brings potential L7 hook to the Happy Eyeballs logic. In some sort, maybe, later. Having a L7 hook would be a good thing - application may have a better idea about failures than layer 3/4. Anyway, I digress.

Another interesting piece:


This means that clients, servers, and proxies MUST be able to recover
from asynchronous close events. Client software SHOULD reopen the
transport connection and retransmit the aborted sequence of requests
without user interaction so long as the request sequence is
idempotent (see section 9.1.2).


This is also a HUGE RED FLAG to the application developers: never EVER use "GET" for anything that is not idempotent. Theoretically the server should not see the two requests, but with some conditions it might (say, proxy inbetween ?) All of this is still true for the specs that are being prepared in HTTPBis WG now.

Here's the well-known humorous piece:


Clients that use persistent connections SHOULD limit the number of
simultaneous connections that they maintain to a given server. A
single-user client SHOULD NOT maintain more than 2 connections with
any server or proxy.


Year right. Web2.0 apps do exactly that. Not.


The Max-Forwards request-header field MAY be used to target a
specific proxy in the request chain. When a proxy receives an OPTIONS
request on an absoluteURI for which request forwarding is permitted,
the proxy MUST check for a Max-Forwards field. If the Max-Forwards
field-value is zero ("0"), the proxy MUST NOT forward the message;
instead, the proxy SHOULD respond with its own communication options.


Is there already a HTTP-level "traceroute" to poke at caches on the way ?

Kind of obvious, but interesting clarification nonetheless:


The fundamental difference between the POST and PUT requests is
reflected in the different meaning of the Request-URI. The URI in a
POST request identifies the resource that will handle the enclosed
entity. That resource might be a data-accepting process, a gateway to
some other protocol, or a separate entity that accepts annotations.
In contrast, the URI in a PUT request identifies the entity enclosed
with the request -- the user agent knows what URI is intended and the
server MUST NOT attempt to apply the request to some other resource.


The "correct" code for post-POST redirections should be 303, not the 302, but 302 was for "older clients":


Note: Many pre-HTTP/1.1 user agents do not understand the 303
status. When interoperability with such clients is a concern, the
302 status code may be used instead, since most user agents react
to a 302 response as described here for 303.


A fun fact while I tested this, is that both Firefox and Chromium send exactly 21 request before giving up and saying "it's a redirect loop". Even if the target URIs in the "Location:" header in the reply are different. Buyer, beware. It's more than "old 5" that the spec warns about - but there's no ultra-clever heuristics to get the redirect loop, either.

This stops on section 12, I'll maybe go through the rest tomorrow and see if I can gather some other interesting pieces.

No comments: