Lecture 7: Internet Applications #3.3: HTTP/1.1

A nice short lecture this time -- we tie up a few last topics in HTTP.

HTTP/1.0 Performance Issues

HTTP/1.0 has been criticised for poor performance and lack of scalability. There are several aspects to this:

HTTP/1.0 opens a new TCP connection for every single transaction. For example, if a Web page contains 10 <img ...> images, HTTP/1.0 must open a total of 11 TCP connections -- one for the original page, and 10 for the images. Problems which arise from this include:
- There can be a moderately long overhead in initial connection establishment due to round-trip delays.
- TCP initiates connections using the so-called "slow start" algorithm. This is necessary for proper operation, but is very inefficient for short transfers -- TCP typically takes 10 to 20KB of transferred data to get "up to full speed". Both of these can cause HTTP/1.0 browsers to seem really slow.
TCP is required to maintain "state" information about closed connections for 240 seconds, to ensure that stray packets from old connections won't be interpreted as valid data by a later connection. When a server is handling a large number of connections, this can require huge buffer space, and is very inefficient.
HTTP/1.0 has limited support for caching.

Because of these aspects, HTTP 1.0 is gradually being replaced by HTTP/1.1. (rfc2616).

HTTP/1.1 Basics

HTTP/1.1 (rfc2616) is now in widespread use. It extends the older protocol in a number of areas, notably persistent connections/pipelining and support for caching.

To implement Persisent Connections, HTTP/1.1 introduced a new request (and also response) header called "Connection:". This can take two values: "close" (which means that this is not a persistent connection) and "keep-alive", which means that the TCP connection is held until either side sends a "Connection: close" header, indicating that it wishes to terminate.

The browser can utilise a persistent connection by sending multiple requests over the connection without stopping and waiting for each them to be satisfied before sending the next -- the reponses are "in the pipeline". Similarly, the server can respond with responses sent one after another another. This is possible because each request can be unambiguously identified, as can the responses, using the "Content-length:" headers. The huge wins here, obviously, are that there's no delay opening multiple TCP connections, and the slow-start algorithm has time to get up to full speed.

Web Caching

The World Wide Web has been spectacularly successful -- so successful that a huge proportion of Internet traffic is HTTP, ie Web pages and related objects such as images. Caching is a technique whereby copies of popular objects are kept in strategic locations, and supplied in lieu of the originals, saving huge amounts of traffic on the "backbone networks".

The Conditional-GET operation seen earlier allows support for caching at the browser level -- that is, the browser can keep a local copy of an object and check if it's up to date before displaying it. Two additional features of HTTP/1.0 were:

The Expires:: response header was used to indicate that an entity had a limited (specified) "lifetime". This permits finer control over the Conditional-GET operation. It takes an Internet-standard date/time string as its value.
The Pragma: no-cache: response header has an obvious meaning: this entity should never be stored in a cache.
Note: the (non-standard) Refresh:: response header can be used (in some browsers) to force a reload of an entity.

Additionally, HTML "<META HTTP-EQUIV=..." tags can include "equivalent" response information in the <HEAD> section of an HTML document. The browser may regard this as being equivalent to the corresponding HTTP response header.

Proxy Caches

A proxy server is an HTTP server which fetches Web objects (pages, images, etc) on behalf of its clients. Proxies normally cache all "cacheable" reponses, so that if an entity is stored locally, it is returned instead of sending a request to the originating server. Such shared caches can significantly reduce an organisation's "download volume", as well as give significant performance improvements to the end-user.

Requests to a proxy server are always specified as full URLs, so the first line of a typical GET request now looks like:

GET http://www.bendigo.latrobe.edu.au/index.html HTTP/1.0
....other request headers...<newline><newline>

HTTP Proxy Server
system diagram

Whilst proxy servers (and caches) were described in HTTP/1.0, the rules as to how caching should be controlled were unspecified.

Cache Control Mechanisms in HTTP/1.1

HTTP/1.1 introduced a new Cache-Control: header which significantly improved the operation of both private (browser) and shared (proxy) caches. This response header is complex: it has many, many possible combinations of value. Some common examples include:

Cache-Control: public: This entity is always cacheable, even in circumstances where it may not be obvious (eg, in response to a request with an Authorization: header.
Cache-Control: private: The reponse is not to be cached in proxy caches, and is intended for the use of the end-user alone. The response may be cached at the end-user browser.
Cache-Control: no-cache: Obvious. Don't cache this reponse anywhere. The no-store directive is even more restrictive.
Cache-Control: max-age=3600: Specifies a time, in seconds, after which the entity becomes "stale". The s-maxage variant specifically refers to proxy (shared) caches. Both of these are commonly combined with re-validation options, to give (for example):
Cache-Control: max-age=3600, must-revalidate: After 3600 seconds, the freshness of the entity must be checked at the originating server.

Entity Tags in HTTP/1.1

The "Entity Tag" is new in HTTP/1.1 and is used to indicate that two (perhaps apparently unrelated) resources are in fact the same. For example, requests for each of the two Web pages:

http://ironbark.bendigo.latrobe.edu.au/subjects/int21cn/news.html
http://ironbark.bendigo.latrobe.edu.au/subjects/int31bcn/news.html

Both return the same Entity Tag header:

ETag: "1cc30e3-88e-404e6d9b"

The client can use an If-None-Match: "1cc30e3-88e-404e6d9b"
request header with a GET request to specify the version of the object which it already has. This is a significant improvement over the HTTP/1.0 "Conditional-GET" -- although not all entities are (by default) generated with Entity Tags.

You can discover lots more about HTTP/1.1 at: http://www.w3.org/pub/WWW/Protocols/Specs.html

La Trobe Uni Logo