apache
-- this is what we run
on ironbark.,
and redgum however there
are several other popular server packages, especially those from Microsoft.
This specifies the application protocol (http://ironbark.bendigo.latrobe.edu.au/index.html
HTTP
) used to fetch the object, the domain name where it
is located and the local filename of the object on that host
(/index.html
). The "magic" string ://
doesn't mean anything in particular except to signify that it's a URL...
HTML is a markup language -- documents are (in general) plain ASCII textfiles, with certain characters reserved to denote markup. Such languages have a long and venerable history in computing (eg starting with *roff, TeX, (see also here), LaTeX, SGML and subsequently XML.
<
" and ">
" -- the "less than"
and "greater than" characters, often (rather clumsily IMHO) called "angle
brackets". If either of these characters must appear as part of the actual
data, they are written as <
and
>
respectively.
<A HREF="...some URL...:">link text</A>
structure. This was revolutionary!
<TABLE>
markup, style sheets,
client-side scrpting, etc), seemlessly mingling text and graphics into what
has become an entirely new form of media. If you're interested to see some very simple hand-crafted HTML, have a look at the document source for these lecture notes...
To revise, in HTTP/0.9 the GET
operation was used to
obtain HTML "pages" from a server, eg: the "home page" of ironbark at URL
http://ironbark.bendigo.latrobe.edu.au/index.html
We first establish a reliable (TCP) connection to the server process waiting
at port 80 (HTTP) on ironbark.bendigo.latrobe.edu.au
. We then send
the single line request shown in italics
and receive in
response the HTML text, shown here in boldface
:
HTTP 0.9 actually defined a few other operations besidesGET /index.html <HTML> <HEAD> <TITLE>The Department of Information Technology at La Trobe University, Bendigo</TITLE> </HEAD> <BODY BGCOLOR="#FFFFFF"> <!-- ******** Department Header ***************--> <IMG SRC="/gifs/irbkname.short.gif" align="right" ALT="La Trobe University, Bendigo"> <font size="+2">La Trobe University, Bendigo</font> ..........etc
GET
. However, since HTTP/1.0 (RFC
1945) and HTTP/1.1 are now commonly used, we shall defer discussion of them.
GET
request looks like:
The response from the server consists of a status line, then a number of plain text headers, followed by a blank line and then the requested data object. It's clearly a very similar format to an RFC822 email message:GET /index.html HTTP/1.0<newline><newline>
GET /index.html HTTP/1.0 HTTP/1.0 200 OK Server: Netscape-Enterprise/3.5.1C Date: Sun, 16 Mar 2004 11:48:39 GMT Content-type: text/html Last-modified: Fri, 14 Mar 2004 02:22:52 GMT Content-length: 11378 <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> <head> ........(etc)
HTTP/1.0 200 OK
Server: Netscape-Enterprise/3.5.1C
Date:
Sun, 16 Mar 2004 11:48:39 GMT
Last-modified: Fri, 14
Mar 2004 02:22:52 GMT
Last-modified:
" header is very useful, see the HTTP/1.0
"Conditional-GET
"
and HEAD
"
request types.
Content-length: 11378
Content-type:
text/html
Content-Encoding:
" (used in MIME-encoded email messages) is not normally used in
HTTP because the protocol is designed to handle "8-bit" data. That is, any
data at all can be sent after the blank line which signifies the end of the
response headers.
GET
RequestGET
request (and other HTTP request types, see later)
to additionally send a series of optional Request Headers along
with the request. For example, here's a typical request to ironbark, snarfed
from the local network (with some cosmetic editing):
The request headers are terminated with a blank line -- hence the need for two newlines, as seen in the first slide of today's lecture. It's also possible for the request to contain a "message body", just like a response message -- we defer discussion of this until later.GET /index.html HTTP/1.0 Accept: image/gif, image/jpeg, */* Host: ironbark.bendigo.latrobe.edu.au User-Agent: Mozilla/4.0 (compatible; MSIE 5.12; Mac_PowerPC) Referer: http://bindi.bendigo.latrobe.edu.au/index.html
If-modified-since:
", which takes an HTTP standard GMT
time/date string as its value.
For example, in the above example we saw an HTTP response with the following header line:
The browser can cache this object (keep a local copy in case it's requested again soon), and use the local copy instead of going out to the network, possibly causing uneccessary delays. The HTTP request would then look like:Last-modified: Fri, 14 Mar 2004 02:22:52 GMT
If the requested page has not, in fact, been modified since the specified time, it won't be returned -- instead, a "GET /index.html HTTP/1.0 If-modified-since: Fri, 14 Mar 2004 02:22:52 GMT User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US; rv:0.9.4) Host: ironbark.bendigo.latrobe.edu.au ....etc, as before
304 Not
Modified
" response is sent, without a response body -- just the
headers. We return to the topic of caching in the
next lecture.