One Way to Hack

This is a post from a series in my project.

In this age of information, cybersecurity is everything. It’s the reason why we have things like online banking and secure payment transactions for online shopping. Without certain security measures, your information is open on the web for anyone to grab. Cryptography is such a broad and deeply researched subject – this short post will not do it justice by any means.

Aside: The terms hacking and cracking have extremely different meanings, although they are usually taken to mean the same thing. Hacking pertains to making new discoveries, like scientists do when they try to cure cancer. Cracking means making discoveries for malicious intent, like people who steal your identity by giving your computer a keystroke-logging virus to steal your passwords or credit card numbers.

What is Cryptography?

From the Greeks (as usual): crypto means secret and graphy means writing. Cryptography is the study of techniques for encrypting and decrypting secrets. There are many ways to code messages or information so that they are unreadable by others. Some of these techniques are easily broken, and some are hot research topics because they are so difficult to break. In 1977, three number theorists (Ron Rivest, Adi Shamir and Leonard Adleman) came up with an algorithm, called “RSA”, for creating unbreakable computer “keys.” The basic premise for their idea is summarized below:

Imagine you want to send a secret message to your friend Alice. You put the message in a box that has two keyholes and one key is taped to its side, key A. Whenever the box is locked with key A, it can only be opened by key B. Similarly, if you lock the box with key B, you can only open it with key A. Key A is public. Anyone can use it (it’s taped to the box). Key B is private. Only your friend has access to this key. Here is the procedure for securing your message:

  1. Lock your box with key A.
  2. Send the message to your friend. Since only key B can open the box, no one can look inside except your friend.
  3. Your friend receives the box and opens it with key B, fully confident that no one else has read the message.

In this picture, Eve is trying to read the message between Alice and Bob, but she can’t do it unless she has the private key.

The trick in this method is in the kinds of keys that you have. First of all, they are digital keys, since all of this is taking place on your computer. RSA-generated keys are created by multiplying huge (like thousands of digits) prime numbers together to make another, bigger number. Just a refresher: prime numbers are special numbers that cannot be divided by any other number than themselves and 1 (e.g., 1, 3, 5, 7, 11, 13, 17, 19, 23, …).

Here is the basic idea:

  1. Multiply two large prime numbers n=p,q
  2. Next, find k = (p-1)(q-1)
  3. Find a number e that is between 1,k and not a factor of k. In other words, 1 \leq e \leq k.
  4. Calculate d,v so that de - vk =1. Don’t worry: a certain theorem of number theory says that d,v will always exist.
  5. Finally, we erase what we have for p,q,k,v
  6. (n,e) forms the public key and (n,d) the private key.

To illustrate this: the epitome of nerdiness. A specific example follows.

  1. Choose p=11 and q=13. We have n=143 and k=120
  2. Let us choose e=7
  3. We have to find d,v so that 13d - 120v = 1. We have d=103 and v=6
  4. The private key is (143,7). The public key is (143,103)

The encryption process takes a message M, which is a number from 1, ... n-1, and transforms it by the following operation:

M^e \mod n = E

Here, the \mod operator represents the modulus. A modulus is the remainder r after division (e.g., \frac{5}{3} = 1 with r=2  and  5\mod 3 = r =2).

For example, the public key (143,7) and message M = 72, becomes E=72^7 \mod 143 = 19.

Undoing this process to read M is called decryption. To decrypt, we have to compute:
E^d \mod n = 19^{103} \mod 143
= ((19^{10})^{10} \mod 143) * (19 \mod 143)
= (56^{10}*19^3) \mod 143
= (100*138) \mod 143
= 72

We have the original message again. Obviously, there are a lot of steps to “secretize” your information, but they are worth it. If you only have the public key (n,e), it is impossible to figure out what the original message is. But, if you know what the factors of n are (p,q), then you can easily back-calculate all the steps required to find the private key (n,d). There is one caveat to this method, though. Public and private keys are for one-time-use only. After you send the message and your friend receives it, you two should never use those keys again. It is simply too risky – if someone is watching your communications and knows you have sent 2-3 messages in the same encrypted pattern, they can deduce the private key.

How to (try) to break RSA

Your online privacy depends on the impossible-to-break nature of RSA. There are a class of problems in computer science which do not have any known algorithms, NP (Non-Deterministic Polynomial Time) problems. Factoring large numbers is an NP problem. In other words, once you multiply two huge, beastly prime numbers, it is impossible for someone to find out what those particular primes are if they only have the product. If it were easy, people could factor n to find p,q and reverse engineer the private key.

The reason why factoring is such a difficult problem has to do with the Fundamental Theorem of Arithmetic. The theorem states that every number can be broken down in to a product of prime numbers. For instance:

24 = 8*3 = 2^3 * 3
100 = 5*20 = 2^2*5
2040 = 2*1020 = 2^2*510 = 2^3*255 = 2^3*5*51 = 2^3*5*3*17
1923 = 3*641

If you choose two prime numbers to multiply together, the product can only be broken down into two numbers, like the number 1923. However, if you choose any two large numbers, you are not guaranteed this quality, like in the number 2040. There are many factors that divide 2040, but only two that will divide 1923. In other words, the chances of finding factors for 2040 are much higher than the chances for finding the factors of 1923. The odds diminish significantly the larger and larger you make the prime numbers – that’s why RSA keys are thousands of digits long.

Two relatively simple examples:

1.
n = p*q
n = 810932741738789015214091083277
p = 876465752114783
q = 925230380972819

An easier problem:
2.
\hat{n} = 55
\hat{p} = 5
\hat{q} = 11

How would you begin factoring numbers like n or \hat{n}? The first thing to come to mind might be trying to divide n or \hat{n} by all of the numbers before them until you find a number that evenly divides out, kind of like:

\frac{55}{1},\frac{55}{2},\frac{55}{3},\frac{55}{4},\frac{55}{5}...

For \hat{n}, see that once you only take a few steps to reach 5. Instantly, you see \frac{55}{5} = 11. Success! This method will certainly find factors for both n and \hat{n}, but it will take much longer for you to find the factors of n than \hat{n} simply because there is such a large difference in magnitude. If you tell the computer to do these steps with number on the order of the same magnitude as n, you would be waiting a really long time for the results. There are research labs that are trying to work on new ways to factor numbers that are much, much larger than n (n is thirty digits long. Actual RSA keys are thousands of digits long). There is a cleverer solution. A research paper by Eric Bach (1991) explores the Pollard’s Rho method for factoring prime numbers. It is a method grounded in probability theory, and draws on the implications of the Birthday Paradox.

The birthday paradox

Essentially what the Birthday Paradox asks is: How many people do you need to talk to before the probability of two people sharing the same birthday (month and day) is 50%? If you ask 367 people, the probability is certainly 100%. If you ask two people, the probability is 0.27%.

Click here for a demonstration of the results.

As it turns out, you only need to ask ~23 people before you have a 50-50 chance of a collision of 2 people born on the same day.

 

The take-away point of the Birthday Paradox is that you need a much smaller sample size (23 people) out of a much larger range of possibilities (367 people) to increase your odds of a collision to as high as 50%. 

 
Pollard’s rho works on this principle. Of the range 1... n, you only need to sample w << n numbers to find the factors that you’re looking for. Of course, when n is very, very large, the algorithm doesn’t work as fast. But for numbers that are about 30 digits long, Pollard’s rho outperforms trial division enormously.

If you’re truly curious, this is a neat video that explains how to solve the problem. It’s not necessary to watch this to understand Pollard’s rho.

Pollard’s rho

The Pollard’s rho algorithm is a surprisingly simple algorithm for factoring numbers. Some people may consider it outdated by today’s standards of cutting-edge number factoring algorithms, but it outperforms trial division (walking through every number from 1 until n) by a significant order of magnitude. John M. Pollard came up with this algorithm in 1975 and Richard Brent added a few modifications to it in 1980. This is the basic scheme:

  • Generate k random numbers based on the magnitude of the number you would like to factor. The “random walk” is not actually random because we are going to use a function f(x) to generate these numbers, however it will look and feel random.
    • f(x) = x^2 + a \mod n, where a is a computer generated random number. f(x) is recursive, which means that whatever you get from the first iteration of f(x), you take that result and you use it as your new input for the function. Another way to see it: x_{n+1} = f(x_n)
  • We can pick numbers x_1, ... x_w and instead of asking if any one of those random numbers divides n, we can ask if the greatest common denominator of n and the difference between two random numbers is greater than 1. That is: GCD(|x_i - x_j|,n)>1. Asking for a non-trivial factor in common increases the number of chances for successes. Thus, if GCD(|x_i - x_j|,n)>1, GCD(|x_i - x_j|,n) is either p or q.
    • The reason why we have more chances of success when looking for GCD(|x_i - x_j|,n)>1 is because when you ask how many numbers divide n, your answer is two: p,q. When you ask how many multiples of p,q share factors with n, you have precisely p+q-2 numbers because:
      p,2p,3p,4p,5p, ... (q-1)p, q,2q,3q,4q,...(p-1)q

An example

You can implement Pollard’s rho in any programming language you please. It is fortuitous that Java has a BigInteger package that can handle numbers with more than 20 digits, so it seems the most natural to me to code this up in Java. This is just a snippet of pseudocode (Pseudocode is not actually code that would run on a computer. Its purpose is to illustrate what steps should happen at which point of the algorithm for clarification). The phrase “while” means “while the statement is true, repeat this operation.”

set x = 2
set y = 2
while ( y does not equal x )
    find f(x)
    find f(f(y))
    if GCD( | y - x | , n) > 1
        p = GCD( | y - x | , n)
        stop looping
return p

Notice that we calculate f(f(y)). This is the change that Floyd made to Pollard’s algorithm. When you run the loop without this extra calculation, sometimes you can get stuck in an infinitely looping disaster like below:

2,10,\mathbf{16,23,29},13,\mathbf{16,23,29},13,...

To prevent this from happening, Floyd said something like, “Let’s pretend there are two racers on a circular track. They will start at the same time and place, but runner B will run twice as fast as runner A. When runner B passes runner A (the stopping condition for the loop, y = x), we know that we are caught in a repeating cycle. Tell the computer to stop looping. Restart the runners with a new random seed and try again.” Floyd’s cycle-detection scheme saves us from worrying about repeating values.

Be glad computers can’t factor

The people who work in cryptography and cybersecurity are notoriously paranoid. When they get wind of the slightest hint of anyone coming up with a way to factor numbers that are hundreds of digits long, they instantly get to work to create bigger and bigger keys. Their worry is rightly justified, though. If someone were to come up with a way to quickly factor numbers, we’d be cooked. Say good-bye to online shopping, online banking, online bill pay, online taxes, direct deposit,… pretty much online-anything.

Advertisements