Lecture 16: Introduction to Encryption

Network Security

This is a big problem. A Real Big problem.
It can be classified into two broad areas:

Computer security in a networked context. This is the subject of a later lecture.
- Computer systems are subject to attack originating from the network
- Extends well known security approaches adopted for protection of non-networked systems and adds network-specific mechanisms
- No general solutions: any solution which provides perfect security seriously degrades usefulness of network...
Security of information in transit across the network. This is the subject of today's lecture.
- Problem is largely due to the bad guys' ability to observe (capture) packets travelling across the network, and possibly an ability to introduce spurious packets.
- Solution is usually in some form of encryption.

Network Monitoring

In most multiaccess networks, it is trivially easy for a host to set its network interface into "promiscuous mode", and capture copies of all data frames which pass across the network.

This is called eavesdropping or (in some circles) packet sniffing (or snarfing.) Once an eavesdropper has copies of all the frames he desires, he can easily view the application data they contain.

We have already seen that most current application protocols send data across the Internet as "strings of printable ASCII" -- ie, the data is sent as plaintext. It is therefore simple to observe messages transmitted by others, if one has access to an appropriate point in the network. This is the origin of the (oft repeated, and generally true) assertion that "The Internet is insecure". The solution is encryption -- encoding the message so that it is unintelligible to the intruder, but can be easily "unscrambled" by the intended recipient. This is secure communication.

An area where this insecurity can present a Really Serious Problem is password authentication. Many application protocols (eg Telnet, FTP, POP3, etc) send usernames and passwords across the network as plain text, exactly the same as other data -- ie, unencrypted. You need to always be aware of this possibility!

Encryption is a vast technical, scientific and political topic, tightly intertwined with the history of computing itself. We will look briefly at a few aspects in this lecture and the next.

Cryptography Basics

A message to be encrypted (known as plaintext) is transformed by the use of a function (or algorithm) parameterised by a key. Traditionally, the same key is used for exncryption and decryption, thus:

The security of the ciphertext depends on two factors:

The nature of the encryption method, or algorithm. It is nowadays generally agreed that open publication of details of the algorithm is a Good Thing.
The secrecy of the key. Current opinion is that, given a suitably powerful encryption algorithm, the security of the system should depend entirely on:
1. keeping the key secret, and
2. the length (usually measured in bits) of the key itself, which is usually a very good indicator of the work factor required to crack the ciphertext by trying every possible key in turn -- called an exhaustive search or brute force attack.

Substitution Ciphers

This is the simplest (and oldest) technique, whereby each character in the message is replaced by another using some rule. The order of the encrypted characters is the same as in the plaintext.

There are many examples of this technique. Most fall into the general category of monoalphabetic substitution, where the output alphabet is the same as the input. For example, in the classic Caesar Cipher, letters of the alphabet were shifted by 3 positions, hence a becomes D, b becomes E, etc. In this case, the key could be said to be "3" -- the distance by which each character was translated. A more complex example uses a "random" reordering of the letters:

plaintext:  a b c d e f g h i j k l m n o p q r s t u v w x y z
ciphertext: Q W E R T Y U I O P A S D F G H J K L Z X C V B N M

hence bad is encrypted as WQR^[1].

This type of cipher turns out to be relatively easy to break, despite the huge (26!) keyspace, by using known statistical characteristics of English (or other languages), eg:

letter frequencies - e is most common, then t, etc.
trigrams (eg the, and) are common.
Some words may be more likely in the particular context.

^[1] using the common convention that plaintext is shown in italicised lowercase and ciphertext is ITALICISED UPPERCASE.

Implementing a Simple Substitution Cypher

Whilst it's not considered even slightly useful for a "production quality" encryption system when used alone, the Exclusive-OR (or XOR) function is worth examining as a simple example of a symmetric encryption method. XOR is a bitwise (or Boolean) operator, with the following truth table:

A B A XOR B

0 0 0

0 1 1

1 0 1

1 1 0

Most programming languages provide an XOR function. To "encrypt" (for example) a byte of data, we could do (in Java, all variables of Java primitive type "byte"):

cipher = plainText ^ key;

The symmetric aspect of XOR comes from the fact that to recover the plain text, we can simply repeat the operation on the ciphertext:

plainText = cipher ^ key;

Symmetic encryption and decyption (ie, using the same algorithm and key to both encrypt and decrypt) is a characteristic of all production "single-key" systems, unlike "Public Key" systems, see next lecture. The XOR function is a key component in these "Real World" algorithms.

Transposition Ciphers

In this system, the order of the plaintext characters is changed, but the characters themselves are not -- that is, the encrypted message contains all the same characters as the plaintext. For example, a simple columnar cipher could be constructed thus:

Hence, the ciphertext is:

EATITNIHMEXNETMGMEDT

Multi-stage transposition was the basis of the famous Enigma encryption machine used by German armed forces and famously cracked by the British intelligence service at Bletchely Park in the second world war.

Neither substitution nor transposition ciphers alone are regarded as secure for serious modern use. The current approach is to combine both techniques, on binary data and with much more complex algorithms (see DES, later).

One-Time Pads

Another approach (although rarely used) is the one-time pad, (or Vernam Cipher) where a simple algorithm is used in conjunction with a key of the same length as the message, and employing a brand new key for every message transmitted.

For example, convert both the plaintext and the key to bit strings (which will be, necessarily, of the same length). Apply the XOR function function bitwise between the strings, giving the ciphertext. The recipient can then apply the same key to the ciphertext using XOR function and thus recover the original plaintext.

This is, in every respect, unbreakable, but rather impractical for real-world use in most cases. Some reasons:

The key is the same length as the message, which means that it will probably be too long to memorise, hence a written (or other mechanical) copy is needed -- potentially dangerous.
The length of the message is limited by the amount of key available.
If synchronisation is lost, subsequent messages will be garbled.

Neverthless, see s/key for a practical, working example of a one-time pad system.

DES - The Data Encryption Standard

DES is a block cipher, which operates on 64-bit data fragments, using a 56-bit key. The basic DES algorithm is described as follows:

The substitution stages in DES involve re-arranging the order of the bits from the previous stage, and using the XOR function to combine them with the key.
The transposition stages re-arrange (ie, swap) the positions of all of the 64 bits.

Note that DES is designed so that decryption is performed by the exact same algorithm as encryption, using the same key -- hence a symmetric, single key cryptosystem.

The effectiveness of DES is based on the complexity of the 19 stages. In the above diagram, two identical 64-bit plaintexts will result in identical ciphertexts. This is called the Electronic Code Book (ECB) mode of operation.

DES In Practice

The ECB mode of operation is now rarely used, since it is now generally agreed that it is breakable given sufficient resources.

In the Chain Block Cipher (CBC) mode, each block of plaintext is exclusive-ORed with the ciphertext output from the previous encryption operation. Thus, the next block of ciphertext is a function of its corresponding plaintext, the 56-bit key and the previous block of ciphertext. Identical blocks of plaintext no longer generate identical ciphertext, which makes this system much more difficult to break.

The CBC mode of DES is the normal technique used for encryption in modern business data communications. A variation on CBC is used where the message may not be a multiple of 64 bits, or where interactive (character at a time) encryption and decryption is desired. This is called Cypher Feedback Mode (CBM), and uses shift registers to permit one byte at a time to be encrypted or decrypted.

Issues with DES

DES has always been controversial:

The original (IBM) algorithm which was the basis of DES used a 128 bit key, changed by the the US government to a 56 bit key before it was issued as a standard. This made DES considerably less secure.
Many people believe that the US government has a "backdoor" (or trapdoor) decryption technique which is infeasible at 128 bits, but possible with 56. This has never been confirmed, and is probably untrue.
56 bit DES is nowadays regarded as "relatively easily crackable". A variation called triple DES (3DES) uses either two keys (112 bits -- encrypt with the first key, then the second, then again with the first) or more commonly three keys (168 bits) and is still considered sufficiently secure for production use.
Other common encryption algorithms in current use include 3DES, IDEA, Blowfish, RC2 and RC5. These are all considered to be appropriate for secure communications.
The US Government has mandated (in 2000) a new encryption algorithm which is intended to replace DES, but it is apparently not yet in wide use.

Some good Internet resources on cryptography are available here La Trobe Uni Logo