Lecture 2: Internet Overview

What happens when we "get on" the Internet?

Two basic possibilities: "dial in", or "always on". In Australian in 2004 it's still most common to dial in to an Internet Service Provider (ISP) using your existing phone line in conjunction with a modem see later, although more and more people are investing in "always on" technologies such as ADSL and Cable Internet. At university, all of our computers are "always on", via dedicated data links.

There are usually some initial technical hassles getting everything working. Most systems have quite a few messy details that have to be configured.
once it's running, we download and view Web pages, send and receive emails, run special-purpose Internet software such as ICQ, RealPlayer, etc, etc.

In this unit, we discover how all this stuff works.

Important Note

In this subject, we attempt to tell no lies... However, we do not guarantee to always tell the whole truth!! That is, for every aspect of Internet and network technology that we cover, there's always more to it than is revealed in these notes. This, of course, quite normal in educational practice, but it's worth reminding you up front.

What IS the Internet?

It's a delivery system.

Big Idea #1: every computer which is "connected to the Internet" has a unique 4-byte identifier. This is called its IP Address (of which we shall see much more later on),
Big Idea #2: when data to be delivered across the Internet, it is first diced into small(ish) units called packets. A packet consists of two parts: a header and a payload. The header contains (among other administrative stuff) the IP address of the packet's destination.
Big Idea #3: the Internet consists of many, many Interconnected Networks -- hence the name! Networks are connected to one another by special-purpose computers called routers.

When a packet is sent "into" the Internet, it passes through a "local" network to the first router -- sometimes called a "gateway". This router examines the packet's destination address and decides which router, of all those it is directly connected to, it should forward the packet onto for its next hop. The process is repeated at the next router, and so on, until the packet reaches its destination.

The definition of how this all works, the format of packets, how routers behave, etc, is defined by the Internet Protocol (IP). In general, a protocol is a set of rules together with a set of data structure definitions (the packet formats) which define how a set of operations (in this case, Internet packet delivery) is carried out.

Internet Characteristics

The Internet simply delivers IP packets. However, for various reasons, things can go wrong. For example, if a packet gets corrupted due to (for example) electrical interference then it won't be possible to deliver it. A more common situation is where a router is receiving packets faster than they can be processed, and runs out of memory buffer space to hold the incoming packets. In these situations, packets can be dropped -- discarded by a router. In most cases this happens silently -- all that's observed externally is that the packet never arrives.

Two other disasters can befall delivery: a sequence of packets may not arrive in the same order in which they were sent, and a packet can even become duplicated during delivery -- that is, the same packet is received twice.

Big Idea #4: these are basic design characteristics of IP -- Internet packet delivery is said to be unreliable.

However, note one important fact -- in this context, unreliable doesn't mean "no good", or "poor quality". It simply says that the delivery system may fail to deliver a packet correctly. In fact, most packets do get delivered correctly. This is because the second design concept for the Internet is best effort -- under normal operation, it works. Sections of the network should only exhibit unreliability under abnormally heavy loads.

A final design concept, that of connectionless IP packet delivery, will be discussed later.

Achieving Reliability: TCP

As we have seen, IP is implemented in every one of the millions of routers that make up the Internet -- the (so-called) "network core". It's obvious, however, that IP alone isn't useful for reliable data transfer.

We introduce the concept of edge systems, or (to use the traditional term) hosts. These are (in general) computers which are connected to the Internet -- in other words, everything that isn't a router. Your desktop computer is an edge system, our main departmental server ironbark is an edge system as are most other servers which you could name.

Big Idea #5: A second protocol, the Transmission Control Protocol (TCP) is implemented in edge systems. It is TCP's task to transform the unreliable delivery service provided by IP into a reliable data transmission system, suitable for building network applications. This is called a transport service.

TCP builds the "payload" of IP packets, by slicing application data into chunks small enough to fit, with a little extra administrative overhead (ie, a TCP header), into a single IP packet. These are called TCP segments^[1], thus:

^[1] more formally segments are called "Transport Protocol Data Units" or TPDUs. No one ever uses this term in relation to the Internet, though.

TCP Reliable Communications

TCP uses IP to deliver segments. Two TCP entities, implemented in two edge systems, communicate directly with one another, using an unreliable communications medium -- the Internet -- as a delivery service. TCP entities communicate by the exchange of segments.

In normal operation a TCP entity running on an edge system sends a segment, containing application data, to a remote TCP entity. The remote TCP receives the segment, and returns a special acknowledegement (ACK) segment back to the originator. Upon receipt of this ACK, the orginating TCP knows that the data has been received correctly.

If a packet, containing a TCP segment, fails to be delivered then no ACK will be received. Eventually the originating TCP will timeout (decide it has waited too long) and re-send the segment. With luck, and given a sufficiently low packet loss rate, the second attempt will be successful. If not, the sender can timeout again, and once again resend the segment. Thus all data will eventually get delivered, although TCP does not guarantee how long it will take.

TCP Connections #1

The TCP service is connection-oriented. When communication is desired, the initiating TCP first sends a special connection request segment, and awaits a connection response. When it arrives, the initiating TCP confirms connection establishment and begins the reliable communications described earlier.

The connection is actually initiated by an Application Process (or program in execution), which requests TCP to establish connection to another application process running on a remote edge system. In general, the remote process is already waiting for connections.

Big Idea #6: TCP uses an abstract address called a Port Number to facilitate communications between processes. A process which is waiting for incoming connections is said to be "at" a particular port number. A TCP connection is made to a specific port because we have arranged, a priori, for the desired service to be provided at that port.

Ports are the addresses of TCP. We can think of them as an adjunct to IP addresses: the IP address specifies a particular computer, whereas the port number specifies which process, running on that computer, we wish to communicate with.

Terminology: A process which is waiting for connections "at" a particular port number is said to be a server process. A process which initiates a connection to a server is called a client process. It's important to note the specific meaning of these words in the context of TCP/IP.

Application Protocols: The Web

A Web server process waits for connections at the well known^[2] port number 80. A Web browser (eg Netscape, IE, Opera -- the client in this transaction) initiates a connection to the Web server. Visualise what happens next by considering a URL:

The protocol section of the URL specifies HTTP, the HyperText Transfer Protocol. This protocol is associated with the well-known port number 80 -- that is, when we connect to a server process at port 80, we expect to "talk HTTP". The domain name of ironbark is just an alternative way of specifying its IP address, which is actually 149.144.21.60 -- see later for more on this. And the desired file on ironbark is (note Unix terminology) "/index.html".

In the simplest version of HTTP (HTTP/0.9 - circa 1993), the client (ie, the Web browser) sends a line of plain, ASCII text to the server process, thus:

GET /index.html

The server responds by returning the contents of the file /index.html, also in ordinary plain (ASCII) text. Finally, the browser process interprets the HTML markup in the returned file, and displays it to the user.

^[2] We shall return to the topic of well-known services later, but basically all of the server port numbers below 1024 are reserved for generally agreed services.

Layered Protocols and Network Architectures

A conceptual, or abstract, representation of this system looks like:

Notes:

The flow of actual data is from one WWW application (eg, the server), down the "stack", across the network and back up to the other WWW application.
The apparent flow of data in the upper layers is peer to peer across the network: the two WWW apps talk "to one another" in HTTP, whilst the two "Transport Service Modules" (TCP) talk to one another by the exchange of segments.
The internal operation of each of the layers is independent of those above and/or below: each layer has a particular function which it must carry out, but it doesn't, in general, concern itself with the operation of other layers -- hence a communications architecture.

Philosophical Digression

The Internet is an unreliable delivery service. It's interesting to compare the Internet to other communication systems, of which the best example is the global telephone system.

The telephone system is circuit-based. A telephone call reserves a channel which can carry continuous speech, reliably, in both directions simultaneously. This is immensely costly in resources: the network must be engineered to provide perfect reliability once a call (or "circuit") has been established. For these reasons, phone calls tend to be expensive. On the other hand, the "end-user" equipment is incredibly simple -- a telephone^[3]. In the case of the the telephone system, all of the complexity is in the network, The edge-systems are trivially simple.

The Internet reverses this. The network (or delivery system) is simple, and doesn't guarantee anything, except a high probability of packet delivery. The complexity is in TCP, which exists only in edge-systems. The edge systems themselves are poweful computers -- sufficiently powerful, at least, to run TCP. We can say that the end-user provides the complexity, whilst the Internet provides a basic service. We could say this this is the last Big Idea for this lecture.

It's also interesting to compare the Internet model with other, older network structures. For example, the AustPac X.25 "Packet Service" was a data transfer system available in Australia many years before the Internet. It offered reliable delivery at the network level, but was very, very expensive -- because the network core was complex. Its commercial success, whilst quite good by the standards of the day, was never, ever going to approach that of the Internet.

^[3] We're talking about "Plain Old Telephone Services" here, of course. The situation changes dramatically if we were to include mobile, cellular telephone systems, where the handset is also very complex. La Trobe Uni Logo