Used to be that if someone robbed our house, we would be in dire straights. Now, if someone takes over our online life, it can do far more damage than that.
Internet started as a trustworthy place. It was funded by US Defence Department (DARPA) to allow researchers to communicate with each other. This led to some remarkable things. For one, a router on the Internet happily sends data from one computer to the other with no taxation, tracing, etc. Every computer trusts itself in that manner. You may have heard of this thing called spam. That is possible because just the same, you can set up a mail server on a computer and proceed to send email to other computers. And they happily accept it. Again, it was all designed under the notion of trust.
Then came things like online banking, shopping, and signing into web sites like ours and the need for security came front and center. Breaching of trust here can be quite serious. Yet most of us don't seem to have a good understanding of how such security works. If you don't know the answer to this question, you need to read on: why is it that if you forget your password, the web site allows you to reset it but never tells you what it is?
Let's get back to the most common solution, email address and password. Email address by itself provides little to no security since we give it to so many web sites. And if not careful, include in the open in what we write on the web (always try to avoid this as it is a sure way to get a ton of spam). It is the password that provides the security. But how does it do that?
Cryptography
Before we understand that, we need to understand Cryptography. This is the science of hiding secrets. As you can imagine, the original need came from warfare but it is a generic tool. The idea is to hide information from everyone but the two parties involved. We do this because the connection between the two parties is "open." That is, anyone can snoop and read what goes back and forth but we don't them to figure out what the messages mean. Recall that the Internet is a trusting place. And how non-trusted boxes routinely pass data from one computer to another. We want to safeguard a technician tapping into one of those routing boxes, capturing everyone's data.
One way to do that is with secrecy. I tell you in private that whatever I type has its characters rotated backward. So when I say "koob" I really mean "book." We call this security by obscurity. Reversing characters is not that obscure of course but we can cook up much more complicated schemes. For example we could each have an identical table of numbers that would take one alphabet character and give us another one. As long as we both have this code book, we can encode and decode messages.
Seems like a fool proof method, yes? Well, not quite. Let's say Amazon servers have this code book and so do I and one of their employees leaves and takes a copy of the table with him. He can now either publish that one the web or use it himself to break every message I send. I could cook up extremely complicated schemes than the table above and it would still be easily broken with one breach of "physical security."
The modern scheme in the last two to three decades has been to use use open cryptography. That is, we tell the world what the mathematical encryption process is. If we are reversing characters, we would clearly tell the world that is what we are doing. What is the reason? To get the benefit of the brain power of everyone in the world to create the best version of these. I can't tell you how many "unbreakable" systems have been broken by just single individual who thought to look for a hole in the scheme and found it. It is best to have that smart person break the scheme before making it final and putting it in products.
The most well-known version of such a scheme is called Advanced Encryption Standard or AES for short. As mentioned, you can look up its completely operation, download software that implements it in any form of fashion you wish. This is the encryption that is used on Blu-ray disc for example. It is also the scrambler that is at work when you see "https" in your browser address location, indicating a "secure connection." AES was standardized in US in 2001 and is by far the most common "bulk encryption" (i.e. what encrypts the data as opposed to passwords) in the world.
How good AES is, depends on its key size. Let's say the key is just a single letter. It would not be hard to find someone's encrypted data and try every character until the messages becomes human readable (e.g. English). Make the key size 256 bits, and now you have more variations than atoms in the universe. Actually I don't know if that is true but sounds good saying it. Regardless it is a really, really bit number. The cost is that it takes more computational resources to encrypt data with longer keys.
With this background, let me answer the question I gave you at the start. Web sites need to remember your passwords so that when you type them next time you visit their site, they can verify it. But that is a problem. If I store the password in a database "in the clear" (i.e. not encrypted), then that rouge employee can steal them and defeat the system. It matters not if I use one or a million bits for my AES key. Once I have the key, there are no barriers left to unauthorized access.
The website could encrypt the data with a key they select. But that key is vulnerable because some programmer needs to embed it in software and that can be stolen.
The solution is a clever one: we use the key to encrypt the key itself! Normally we use the key to encrypt the data we want to transmit back and forth (e.g. your banking information). But here, when you set up a new account online and select a password, the password is fed to the encryption engine, e.g. AES, as both data and key. What comes out is encrypted data that can only be reversed if one has the key, i.e. the password. Sometime later you try to log into the system and give the system your password. The website software repeats the process again: it encrypts the password with itself and out pops out the encrypted data again. It then compares this encrypted data to what it had stored in its database. If the two match, you have typed the right password. If they do not, you have not.
Now if someone steals the password database as is routinely done on web sites today, they in principle have nothing. They see a list of email addresses and encrypted passwords. Unless they find a way to decrypt the passwords, which by definition, need the password itself, they are stuck.
This is why the web site software can never tell you what your password is. It does not know it! It only knows the encrypted one which is no good to you because you want it prior to that. What they can do is delete your password and force you to set up a new one.
How Secure Are Your Passwords/Some Rainbows Are Not Pretty
This is a very deep topic but I want to share some key aspects that have everyday applicability.
Remember that the number of keys is proportional to its size. Assuming any binary value is acceptable, a 64 bit key for example will allow 2^64 or 1.8446744e+19 (a one with 19 zeros after it) variations. Trying that many combinations for each user is going to take a long time. What hackers do is take shortcuts. For example, they know humans don't like to remember random things so instead of going through every binary value, they use a dictionary. If your password is a "dog" or "flower," it will be found a heck of a lot faster than "d0g111!$" which is not in any dictionary.
As you know, many web sites force you to use numbers and special characters in your passwords. This immediately pushes you out of dictionary attacks. But how about Amazon123? Likely many people would pick such a password. But these variations don't exist in any dictionary. The "brute force" solution to this says try passwords from "Amazon" to "Amazon999999999999999." For example, you would try Amazon1, then Amazon2, then Amazon3. Then start with Amazon11, then Amazon 12, then Amazon123 and so on.
The problem with this scheme is one of data size. If I have 256 bit keys, such a table just starting with the word Amazon could get quite huge. The solution is something called a "rainbow table." The idea here is rather simple: we create a system where we store one master value but then can incrementally go to the next one. I gave examples of this scheme above. We know that if we are going to guess Amazon with two digits after it, the starting point would be Amazon00 to Amazon99. So why store all of the values in between? We put in the productive (i.e. common use pattern) ones in the table and let the system compute the values in between. This is called a rainbow scheme because you can predict the colors of the rainbow so you don't need to write down all of them. Same is true of creating the table start and end points.
What we have done here is to optimize the CPU power it takes to find passwords by going after common patterns. And reduced the amount of data it takes to store all the common variations. The upshot is that given a password file, there is software you can download right now that in hours or days will spit out more correct passwords than you have time to exploit! Yes, you read that right. Software that uses all of these schemes is a google search away.
So as you see, the system in use is weak. Best you can do is to make your password is long, obscure and change them often. And keep your online account to fewest vendors out there. Personally I am sticking with Amazon for almost all of my purchases. Even if I have to pay more, I bought through them. Many vendors sell through Amazon even if they have their own e-commerce site.
Internet started as a trustworthy place. It was funded by US Defence Department (DARPA) to allow researchers to communicate with each other. This led to some remarkable things. For one, a router on the Internet happily sends data from one computer to the other with no taxation, tracing, etc. Every computer trusts itself in that manner. You may have heard of this thing called spam. That is possible because just the same, you can set up a mail server on a computer and proceed to send email to other computers. And they happily accept it. Again, it was all designed under the notion of trust.
Then came things like online banking, shopping, and signing into web sites like ours and the need for security came front and center. Breaching of trust here can be quite serious. Yet most of us don't seem to have a good understanding of how such security works. If you don't know the answer to this question, you need to read on: why is it that if you forget your password, the web site allows you to reset it but never tells you what it is?
Let's get back to the most common solution, email address and password. Email address by itself provides little to no security since we give it to so many web sites. And if not careful, include in the open in what we write on the web (always try to avoid this as it is a sure way to get a ton of spam). It is the password that provides the security. But how does it do that?
Cryptography
Before we understand that, we need to understand Cryptography. This is the science of hiding secrets. As you can imagine, the original need came from warfare but it is a generic tool. The idea is to hide information from everyone but the two parties involved. We do this because the connection between the two parties is "open." That is, anyone can snoop and read what goes back and forth but we don't them to figure out what the messages mean. Recall that the Internet is a trusting place. And how non-trusted boxes routinely pass data from one computer to another. We want to safeguard a technician tapping into one of those routing boxes, capturing everyone's data.
One way to do that is with secrecy. I tell you in private that whatever I type has its characters rotated backward. So when I say "koob" I really mean "book." We call this security by obscurity. Reversing characters is not that obscure of course but we can cook up much more complicated schemes. For example we could each have an identical table of numbers that would take one alphabet character and give us another one. As long as we both have this code book, we can encode and decode messages.
Seems like a fool proof method, yes? Well, not quite. Let's say Amazon servers have this code book and so do I and one of their employees leaves and takes a copy of the table with him. He can now either publish that one the web or use it himself to break every message I send. I could cook up extremely complicated schemes than the table above and it would still be easily broken with one breach of "physical security."
The modern scheme in the last two to three decades has been to use use open cryptography. That is, we tell the world what the mathematical encryption process is. If we are reversing characters, we would clearly tell the world that is what we are doing. What is the reason? To get the benefit of the brain power of everyone in the world to create the best version of these. I can't tell you how many "unbreakable" systems have been broken by just single individual who thought to look for a hole in the scheme and found it. It is best to have that smart person break the scheme before making it final and putting it in products.
The most well-known version of such a scheme is called Advanced Encryption Standard or AES for short. As mentioned, you can look up its completely operation, download software that implements it in any form of fashion you wish. This is the encryption that is used on Blu-ray disc for example. It is also the scrambler that is at work when you see "https" in your browser address location, indicating a "secure connection." AES was standardized in US in 2001 and is by far the most common "bulk encryption" (i.e. what encrypts the data as opposed to passwords) in the world.
How good AES is, depends on its key size. Let's say the key is just a single letter. It would not be hard to find someone's encrypted data and try every character until the messages becomes human readable (e.g. English). Make the key size 256 bits, and now you have more variations than atoms in the universe. Actually I don't know if that is true but sounds good saying it. Regardless it is a really, really bit number. The cost is that it takes more computational resources to encrypt data with longer keys.
With this background, let me answer the question I gave you at the start. Web sites need to remember your passwords so that when you type them next time you visit their site, they can verify it. But that is a problem. If I store the password in a database "in the clear" (i.e. not encrypted), then that rouge employee can steal them and defeat the system. It matters not if I use one or a million bits for my AES key. Once I have the key, there are no barriers left to unauthorized access.
The website could encrypt the data with a key they select. But that key is vulnerable because some programmer needs to embed it in software and that can be stolen.
The solution is a clever one: we use the key to encrypt the key itself! Normally we use the key to encrypt the data we want to transmit back and forth (e.g. your banking information). But here, when you set up a new account online and select a password, the password is fed to the encryption engine, e.g. AES, as both data and key. What comes out is encrypted data that can only be reversed if one has the key, i.e. the password. Sometime later you try to log into the system and give the system your password. The website software repeats the process again: it encrypts the password with itself and out pops out the encrypted data again. It then compares this encrypted data to what it had stored in its database. If the two match, you have typed the right password. If they do not, you have not.
Now if someone steals the password database as is routinely done on web sites today, they in principle have nothing. They see a list of email addresses and encrypted passwords. Unless they find a way to decrypt the passwords, which by definition, need the password itself, they are stuck.
This is why the web site software can never tell you what your password is. It does not know it! It only knows the encrypted one which is no good to you because you want it prior to that. What they can do is delete your password and force you to set up a new one.
How Secure Are Your Passwords/Some Rainbows Are Not Pretty
This is a very deep topic but I want to share some key aspects that have everyday applicability.
Remember that the number of keys is proportional to its size. Assuming any binary value is acceptable, a 64 bit key for example will allow 2^64 or 1.8446744e+19 (a one with 19 zeros after it) variations. Trying that many combinations for each user is going to take a long time. What hackers do is take shortcuts. For example, they know humans don't like to remember random things so instead of going through every binary value, they use a dictionary. If your password is a "dog" or "flower," it will be found a heck of a lot faster than "d0g111!$" which is not in any dictionary.
As you know, many web sites force you to use numbers and special characters in your passwords. This immediately pushes you out of dictionary attacks. But how about Amazon123? Likely many people would pick such a password. But these variations don't exist in any dictionary. The "brute force" solution to this says try passwords from "Amazon" to "Amazon999999999999999." For example, you would try Amazon1, then Amazon2, then Amazon3. Then start with Amazon11, then Amazon 12, then Amazon123 and so on.
The problem with this scheme is one of data size. If I have 256 bit keys, such a table just starting with the word Amazon could get quite huge. The solution is something called a "rainbow table." The idea here is rather simple: we create a system where we store one master value but then can incrementally go to the next one. I gave examples of this scheme above. We know that if we are going to guess Amazon with two digits after it, the starting point would be Amazon00 to Amazon99. So why store all of the values in between? We put in the productive (i.e. common use pattern) ones in the table and let the system compute the values in between. This is called a rainbow scheme because you can predict the colors of the rainbow so you don't need to write down all of them. Same is true of creating the table start and end points.
What we have done here is to optimize the CPU power it takes to find passwords by going after common patterns. And reduced the amount of data it takes to store all the common variations. The upshot is that given a password file, there is software you can download right now that in hours or days will spit out more correct passwords than you have time to exploit! Yes, you read that right. Software that uses all of these schemes is a google search away.
So as you see, the system in use is weak. Best you can do is to make your password is long, obscure and change them often. And keep your online account to fewest vendors out there. Personally I am sticking with Amazon for almost all of my purchases. Even if I have to pay more, I bought through them. Many vendors sell through Amazon even if they have their own e-commerce site.