Recently I visited Bletchley Park, the location where Alan Turing lead the team that cracked the codes transmitted using the German Enigma and Lorenz Cipher during WWII. Learning about how the enigma worked, and how rudimentary it seemed to me got me thinking about how people of the future may think the same about current cryptographic methods. It’s easy to think of the Enigma to be a rudimentary machine without considering the computing power that was available at the time. The oldest working computer in the world (Harwell Computer), located at Bletchley Park, would take a year to perform a number of operations that a modern day smart phone could calculate in 100 ms.
The Harwell Computer
How did the Germans encode messages?
The enigma machine was used by many people in the German military to send and receive coded messages. It works by substituting pairs of letters. The substitution rules are initially defined by the configuration of the 3 rotors on the enigma, along with the wire configuration at the front. However, after each press of a letter, the rotors would increment, thus changing the substation rules for the next next key press. This made it hard to decode, as a letter didn’t just represent another letter; the letter pairs were dynamic. The initial configuration was defined for each day, and on each communication channel, and set up by checking a code book. This means that once a code was cracked in a day, the the rest of the messages for that day could be decoded.
Most of the decoding work done by Alan Turing worked on the basis that the Germans were quite habitual about their communications. For example, at a particular time in the day, some messages would start with “Wettervorhersage”, meaning weather forecast in German. This could then be used as a reference point to check if their decoding efforts were correct. Various machines were designed specifically to decode enigma messages, and could work quite efficiently by using parallel calculations.
The Lorenz Cipher was used for more sensitive information by the German High Command to transmit strategical information. It was much more complex than the enigma. When a message was transmitted, rather than a code book for reference, the encoding configuration was transmitted unencrypted as a header, at the start of the message. Work to decode this cypher only began after a message from Athens to Vienna was, despite being forbidden, transmitted twice using the same encoding configuration. Whilst the header was the same, the messages were slightly different due to abbreviations. This allowed the two same-key encrypted messages to be compared, and the logical structure of the cipher could be determined. This took months of ingenious work. The Colossus computer, the first programable electronic computer, was designed and employed to crack all future Lorenz machine codes, and shaved years off the war’s length.
A small view of part of Colossus
Current Cryptographic Methods
In modern cryptography, typically a hashing algorithm is used to help verify the integrity of the data or the sender, and encryption algorithms are used to obfuscate the data.
As discussed in a previous post about bitcoin, hash functions are a set algorithm that produce that same result every time when given the same input. A piece of data is passed through the algorithm, and a hash is outputted. A good hashing algorithm will produce a hash that is unique to that piece of data, and changing the original data even very slightly, will result in an entirely different hash. However, due to the fact that the hash contains less data than the original inputed data, it is a mathematical certainty that duplicate hashes occur, but are often never discovered (this depends on the quality of the hash function). These duplications are called collisions. Another feature of hashing algorithms is that they cannot be reverse engineered to the original piece of data without using an unreasonable amount of computing power and time.
SHA (Secure Hashing Algorithm) is the common name for a family of cryptographic hash functions developed by the NSA. The first method (SHA-0) was discovered to have a security flaw, and was quickly abandoned for an appreciated version (SHA-1). Further development lead to the creation of the SHA-2 algorithms (SHA-224, SHA-256, SHA-384, and SHA-512). These are commonly used today for a variety of applications, including SSL/TLS encryption, PGP (Pretty Good Privacy) and Bitcoin.
So what is the point in converting a long piece of data into a smaller piece of data in the form of a hash? One common application is to verify the integrity of a file. If a hash is provided with a file in transit, then the algorithm can be run at the file’s destination, and compared to the hash supplied, to check that the file remains intact. The MD5 algorithm is often used for this application. In cryptography, hashing is very useful for ‘signing’ a piece of data.
A digital signature algorithm uses public and private key pairs in conjunction with a piece of data to prove that the data was produced by the person signing it. This can be used to check that the data hasn’t be intercepted and modified by a third party. By running the computationally expensive digital signature algorithm with a small 20-byte hash, rather than gigabytes of data, digital signature can be produced in a reasonable amount of time. Digital signatures rely on the fact that it is not feasible to find two of the same hash values, so signing the hash instead of the original data is sufficient.
Whilst hashing algorithms obfuscate the original data, they are not sufficient for the encryption of data, since a hashing algorithm is unfeasible to reverse engineer, and thus the receiver cannot read the original data. Hashing data for the purpose of digital signatures to verify data integrity, in combination with encrypting data allows for a comprehensive solution for data encryption and validation.
RSA, named after the initials of the three people who invented it, is a set of algorithms that perform two functions, encryption and digital signatures. RSA uses a private and public key as parameters to decrypt and encrypt respectively. Similar to the hashing function, the RSA algorithms are set pieces of code that do not change for anyone, but instead of having just the input data as a parameter, a pair of keys are also used.
Symmetric encryption is a basic method, in which the private key and the public key are the same. In this case, the encrypter and decrypter need to share their secret key between them and not allow anybody else to know it. This means a shared secret must be known before unobservable data transfer can occur. The enigma worked symmetrically with the use of a shared code book to set up the machine at either end of the transmission (encryption or decryption).
Asymmetric encryption employs a private key, and a public key that is derived from the private key. They are algorithmically linked. The advantage of this system means that messages can be encrypted for an intended target without needing to know any secrets beforehand. Similar to the hashing function’s requirements to reverse engineer the hash to original data, attempting to derive a private key from a public key is unfeasible due to its computational difficulty. Creating a pair of keys that are linked, but calculating the private key from the public key unfeasible requires a lot of complicated mathematics. Using this system, a message can be sent securely without their being any pre-shared secrets.
Whilst asymmetric encryption is more mathematically difficult than symmetric encryption, it simplifies the key distribution system. For example, with symmetric encryption, if person X in a pool of 100 people needed to communicate to everybody in the pool individually, a key would needed to be assigned for each of the other 99 people and shared between person X and the the corresponding people. If all 100 people needed to encrypt messages to the other 99 people, 4950 keys would have to be assigned to accommodate for each pair of people (sum of 1…99). This number becomes unmanageable the larger the number of people there are. In comparison, with asymmetric encryption with a pool of any size, everybody needs to remember their own private key, and everybody else needs to lookup the public key of their intended recipient.
RSA is not a perfect algorithm. With a key pair with a size of 1024 bits, it can only encrypt a message up to 117 bytes in length. This is pretty short and not suitable for most needs. To solve this problem, hybrid encryption is employed. This uses RSA to encrypt a small section of the data called the session key. The session key and the rest of the data are then encrypted symmetrically resulting in an encryption of the complete data.
RSA and SHA are not the only encryption and hashing algorithms in use. There are others such as DSA, ECDSA, AES, 3DES and ElGamal which are used for hashing, encryption and negotiating keys.
The Future of Cryptographic Methods
The key to the success of current cryptographic methods is due to the expense and time that would be required to guess the keys needed to decrypt a message. If a random number generator was used to brute force calculate a key, it would simply take too long (think many years, or forever). In theory it could guess it straight away by chance, but the same is true of the details used to login to an online bank. However, every unfeasible calculation could be calculated if a powerful enough computer existed, or if enough normal computers were working in unison.
Quantum computers are predicted to make current common encryption methods worthless, putting online transactions, file confidentiality or even clandestine networks at risk to exposure. Shor’s algorithm is a quantum algorithm (designed to be run on quantum computers) that solves for integer factors. On a classic computer, prime numbers are very difficult to calculate as there are no shortcuts; every number must be factored to verify if it is a prime. Prime numbers play a huge part in the mathematics of the RSA algorithm (and others). Quantum computers speed up these calculations exponentially. There are commercially available quantum computers available already. However, current quantum computers do not have enough qubits (quantum bits) to factor large numbers. The largest number that has been factored is 21 (October 2012). Seth Lloyd, professor of quantum mechanical engineering at MIT, said “I don’t think we’re likely to have the type of quantum computer (that solves this problem) within at least five years, in the absence of a significant breakthrough maybe much longer”. However, the first organisation to solve this problem, likely a government three letter organisation, is not going to tell anybody for years, as they will have the skeleton key to secrecy, and will want to keep that to themselves.
Due to the impending problem that cryptography faces with the development of quantum computers with more qubits, work has been undertaken on post-quantum cryptography. These are algorithms that are not more efficiently breakable by a quantum computer than a normal computer.
A quote that is often, perhaps erroneously, attributed to theoretical physicist Richard Feynman, says “If you think you understand quantum mechanics, you don’t understand quantum mechanics”. I don’t think I understand quantum mechanics at all, so any writing about it ends here.
Alan Turing: The father of code-breaking