With ever increasing communication via the internet and its decentralised nature, it’s important to protect the data that we sent through it. Sadly, not everyone is nice and there are malicious people out there willing to steal your data, such as gaining access to your online banking. To prevent this happening, there are multiple tools developers can use. You will never have a 100% guarantee that the browser you are using is secure, or that a mobile app uses SSL connection, you can only trust them that they use the correct tools and use them properly, but it’s better to trust that it’s secure than knowing it’s not.
In this article I want to describe some of the most common types of ciphers and algorithms that are used today. These algorithms are considered unbreakable, unless I state otherwise. Some are even used by NATO forces or government agencies such as CIA, MI5 or Czech BIS. Just because a cipher is considered unbreakable, this does not mean it is 100% unbreakable, just that all known attacks against it are brute-force style attacks, or at least very computationally expensive.
Ciphers have been present for as long as people have engaged in politics and war. When you have enemies, naturally you want to hide some information from them. When two army camps wanted to communicate, they had to send messengers with plans. If the messenger gets captured and your secret gets exposed, your plans can end in a catastrophe.
One of the most famous ancient-era symmetric ciphers was Caesar's cipher. It is named after Julius Caesar (100-44BCE), although there are records of this cipher being used earlier. This simple cipher shifts each letter in a word by several places (alphabetically, wrapping around), where the key is the number of places to shift. For example, with the key K=3, A becomes D, B becomes E and so on. When deciphering, you do the opposite operation, C becomes Z, D becomes A… Text "ATTACK AT DAWN" with key 3 gets encrypted to "DWWDFN DW GDZQ".
It wasn’t until the 20th century that cryptography started to take off. Once people started using electricity to send messages, whether wired or wireless, they realised that anyone could listen in if they have the right equipment. Thus began the era of modern cryptography. In the Second World War, the Germans used the Enigma machine to encrypt their messages. It was considered uncrackable, but the combined work of Polish cryptographers and, later, Alan Turing broke it. The major flaws in Enigma were that a letter could never become itself and that the Germans used to send very structured messages; for example, a weather report begun with the phrase “Weather report”, thus allowing cryptographers to try and match the structured words and find the Enigma key for the day.
These ciphers protect data by encrypting plain text into cipher text, where the resulting output is unreadable to anyone except those who have the key for decryption. There are two main groups, symmetric and asymmetric algorithms. Symmetric algorithms use the same key for encryption and decryption; asymmetric uses one key for encryption and another for decryption.
Symmetric ciphers have the advantage of being fast and easy to implement, but the disadvantage that the keys need to be exchanged before secure communication can begin. Historically, the Data Encryption Standard (DES) cipher was used, but its small key length made it obsolete in the 1990s. There was a way to prolong its life called TripleDES, which was the DES cipher run three times, however even that is now considered old. The modern way is to use the Advanced Encryption Standard (AES) cipher. AES is very quick to encrypt and decrypt and at this moment it is considered unbreakable. The only feasible option for breaking this cipher is using a brute-force attack, with some improvements that reduce the search space (for 256-bit key from 2256 to 2254).
AES splits the data in blocks of 128 bits and uses 128-, 192- or 256-bit keys. Each block is a 4x4 matrix of bytes and each goes through several rounds of transformation, consisting of byte substitution, column mixing, row shifting and XOR-ing the block with the key. In order to improve security, there are several modes that AES can operate in. The most basic one is the Electronic Code Book (ECB) mode. This mode encrypts each block with the key, but this has the big security flaw that if two blocks are identical, they will have identical cipher text. Therefore, it is recommended to use different modes, such as CBC, CFB, CTR and others. These usually have an additional parameter - initialisation vector (IV), which is the same length as the key and can be thought of as secondary key. These modes handle each block differently, they might first XOR the key with the output of the previous block, resulting in a different output even for same input blocks.
You can clearly see the image outline when ECB mode is used.
Having a single key for both encryption and decryption can be useful, but sometimes you need to exchange information on an insecure channel - this is where asymmetric ciphers come in. They have 2 keys - one public, one private. The public key is public - everyone can see it and use it. The private key can never be published, otherwise the entire keychain is compromised, and new keys must be generated. The process is very simple. Person A encrypts the plain text with B's public key and sends it to person B. Person B uses their private key to decrypt it. No one, apart from B can decrypt, because only B knows their own private key. Everyone can know B's public key, but that will only allow them to send encrypted messages to B.
The most used asymmetric algorithm is RSA, named after its inventors, Rivest, Shamir and Adleman. This cipher works with prime numbers and modular exponentiation. Its strength lies in picking two prime numbers, p and q and then computing n = p * q. This n is exposed as part of public key and is secure, due to the very high computational complexity of factoring a number into its prime factors (n is in fact a 2048- or 4096-bit number). In 2019 researchers factored a 795-bit number in 900 core-years of computer power, and with every bit it gets harder and harder. If someone comes up with an efficient integer factorisation algorithm, RSA will become obsolete and the world will have to switch to other public-key cryptography algorithms, such as elliptic curves.
AES is a great algorithm, but its only weakness is the need to exchange keys. Without a secure channel, an attacker can listen and be able to intercept the key and decrypt every message. Luckily there are ways to exchange keys securely over insecure channel. One such algorithm is Diffie-Hellman (DH), which was published in 1976, however nowadays it is not commonly used. Instead, RSA and certificates are the industry standard. In simply put, DH works with modular arithmetic and its strength is in its computational complexity.
A perfect analogy for DH is the mixing of colours. Red and blue make purple and it's easy to mix them, but if someone gives you bucket of purple paint, how do you know how much red and how much blue was used? And which shade of red and blue? That is why this picture using paints perfectly describes how Diffie-Hellman works:
Modern day key-exchange is done via RSA and certificates. As the RSA section above mentioned, it's base are two prime numbers which are kept secret. From them several parameters are calculated, and some are published - published in the form of a public certificate. Certificates make a chain, each certificate is signed by a certificate one level higher than itself, up to the root certificate. There are hundreds of root certificates in the world, and they are one of the best kept secrets of the internet. These root certificates are implicitly trusted – the trust list is managed by your operating system or your web browser. When a (decent) library or browser attempts to make a RSA connection, it requests certificate from the other party and checks if it is valid - expiration date, domain name, certificate chain, revocation status... and if all checks pass, it will initiate RSA communication and inside it's secure channel, it will exchange the key.
In certain situations, it's useful to create some sort of fingerprint (checksum) for any data, whether it is a file, an image, or a program. This is where the hashing functions come in. Hashing is a one-way transformation of any length source data into a hash, a fixed size stream of bytes. Hashing functions must have two properties to be considered good:
Historically, MD5 was used for these purposes. Released in 1992, it was widely used for password storage, checksums, and other uses. In 2005 researchers found ways to create collisions for any MD5 hash. The Czech security researcher Vlastimil Klíma found a way to create any collision in few hours on an ordinary laptop computer. A few months later he improved the algorithm and reduced the time to find an MD5 collision to one minute. Apparently, he broke MD5 while he was away at his weekend cottage. MD5 is considered cryptographically broken and should not be used for any security related tasks.
The Secure Hash Algorithm (SHA) is the successor to MD5 when it comes to hashing algorithms. There are several versions, ranging from SHA-0 to SHA-3, each having multiple variants, each with a different output size. SHA-0 and SHA-1 are not recommended for use. Collisions were found; what is recommended is to use at least SHA-2 with 256 bit output or any of the SHA-3 variants.
The primary functions of hashing are to create fingerprints of data (e.g., when uploading a file, to check if it already exists) and to securely store passwords. Passwords should never be stored in plaintext, instead they should he hashed and salted (I’ll explain later). When a user tries to log in, their password gets hashed and the hashes are compared - if they match, they are let in. By using a secure algorithm, this is safe, even if the hashes get leaked, since the attacker does not know the plaintext password and can’t easily create another plaintext which hashes to the same value.
When storing passwords, it’s not a good idea to just take the plaintext password, hash it and store it. There are lookup tables of the most commonly used passwords and their hashes, so an attacker can just look up leaked hashes to get the plaintext passwords. The solution is to add some randomness, by adding some salt.
Salt is a random string of characters, unique per user, with sufficient length (20+ characters, GUID’s are perfect for this) which can be stored alongside the hash in plaintext form. This salt is concatenated with the plaintext password, and both are hashed as one. Because every user has a different salt, even if two users use the same password, their hashes will be different, therefore lookup style attacks are impractical. An attacker would have to take the leaked hash and salt and try to calculate the hash of every possible password concatenated with salt and compare it to the leaked hash, which takes a lot of computing power just to break one password.
For passwords, it’s recommended to use the bcrypt hashing function. This function has an adaptive iteration count, which has an effect on hashing speed. If the iteration count is chosen right, it causes the bcrypt function to take 1-2 seconds to complete, which is perfectly fine when the user is trying to login with their password, but for brute-force attacks it makes it incredibly expensive, timewise.
To understand why MD5 and hashing passwords without salt is a bad idea, try googling this MD5 hash:
These two works on pretty much the same principle. The gist of it is the same - to check integrity and authenticity of the data. In this section we will look at HMAC and digital signature.
HMAC, short for Hashed Message Authentication Code, is an algorithm to verify the integrity of data. It works by first selecting a key and then running it through this algorithm:
HMAC(KEY, MESSAGE) = HASH((KEY ^ O_PAD) + HASH((KEY ^ I_PAD) + MESSAGE))
where O_PAD and I_PAD are outer and inner padding (repeated values of pre-defined bytes), ^ is bitwise XOR and + is concatenation.
HMAC is then sent alongside the message. The recipient can take the message, run their own HMAC calculation and compared it to the received HMAC. If they match, they know the message was not tampered with. For this, they need to know the key, so some form of key exchange must happen before that.
Digital signature works similarly, instead of using shared key like HMAC, it uses RSA. To sign a message, the hash is calculated and encrypted with the sender’s private key using RSA. The message and the signed hash are then sent. The recipient checks if HASH(message) == RSA_DECRYPT(ReceivedHash, PublicKey) and if they match, they know the message is authentic. However, for this to be 100% safe, the recipient must have the public key of sender, e.g., you must send your public key to your bank, so that when you send them a digitally signed message, they can verify that it was from you.
Cryptography is all around us. Anytime you access a website via HTTPS, your computer uses several of these algorithms to securely transfer data. Someone is always listening and that is especially true on the internet. Your request moves across multiple servers and there is no knowing if some server is compromised or not. We can only believe what security researchers, operating system programmers, and library maintainers tell us. We might never know if some cipher has a backdoor built in, or if it was cracked by some secret government agency. We can only believe!