This is a post from a series in my project.

In this age of information, cybersecurity is everything. It’s the reason why we have things like online banking and secure payment transactions for online shopping. Without certain security measures, your information is open on the web for anyone to grab. Cryptography is such a broad and deeply researched subject – this short post will not do it justice by any means.

Aside: The terms *hacking* and *cracking* have extremely different meanings, although they are usually taken to mean the same thing. *Hacking* pertains to making new discoveries, like scientists do when they try to cure cancer. *Cracking* means making discoveries for malicious intent, like people who steal your identity by giving your computer a keystroke-logging virus to steal your passwords or credit card numbers.

# What is Cryptography?

From the Greeks (as usual): *crypto* means *secret* and *graphy* means *writing*. Cryptography is the study of techniques for encrypting and decrypting secrets. There are many ways to code messages or information so that they are unreadable by others. Some of these techniques are easily broken, and some are hot research topics because they are so difficult to break. In 1977, three number theorists (Ron Rivest, Adi Shamir and Leonard Adleman) came up with an algorithm, called “RSA”, for creating unbreakable computer “keys.” The basic premise for their idea is summarized below:

Imagine you want to send a secret message to your friend Alice. You put the message in a box that has two keyholes and one key is taped to its side, key A. Whenever the box is locked with key A, it can only be opened by key B. Similarly, if you lock the box with key B, you can only open it with key A. Key A is public. Anyone can use it (it’s taped to the box). Key B is private. Only your friend has access to this key. Here is the procedure for securing your message:

- Lock your box with key A.
- Send the message to your friend. Since only key B can open the box, no one can look inside except your friend.
- Your friend receives the box and opens it with key B, fully confident that no one else has read the message.

In this picture, Eve is trying to read the message between Alice and Bob, but she can’t do it unless she has the private key.

The trick in this method is in the kinds of keys that you have. First of all, they are digital keys, since all of this is taking place on your computer. RSA-generated keys are created by multiplying huge (like thousands of digits) prime numbers together to make another, bigger number. Just a refresher: prime numbers are special numbers that cannot be divided by any other number than themselves and 1 (e.g., 1, 3, 5, 7, 11, 13, 17, 19, 23, …).

Here is the basic idea:

- Multiply two large prime numbers
- Next, find
- Find a number that is between and not a factor of . In other words, .
- Calculate so that . Don’t worry: a certain theorem of number theory says that will always exist.
- Finally, we erase what we have for
- forms the public key and the private key.

To illustrate this: the epitome of nerdiness. A specific example follows.

- Choose and . We have and
- Let us choose
- We have to find so that . We have and
- The private key is . The public key is

The encryption process takes a message , which is a number from , and transforms it by the following operation:

Here, the operator represents the modulus. A modulus is the remainder after division (e.g., with and ).

For example, the public key and message , becomes .

Undoing this process to read is called decryption. To decrypt, we have to compute:

We have the original message again. Obviously, there are a lot of steps to “secretize” your information, but they are worth it. If you only have the public key , it is impossible to figure out what the original message is. But, if you know what the factors of are (), then you can easily back-calculate all the steps required to find the private key . There is one caveat to this method, though. Public and private keys are for **one-time-use** only. After you send the message and your friend receives it, you two should **never** use those keys again. It is simply too risky – if someone is watching your communications and knows you have sent 2-3 messages in the same encrypted pattern, they can deduce the private key.

# How to (try) to break RSA

Your online privacy depends on the impossible-to-break nature of RSA. There are a class of problems in computer science which do not have any known algorithms, NP (Non-Deterministic Polynomial Time) problems. Factoring large numbers is an NP problem. In other words, once you multiply two huge, beastly prime numbers, it is impossible for someone to find out what those particular primes are if they only have the product. If it were easy, people could factor to find and reverse engineer the private key.

The reason why factoring is such a difficult problem has to do with the Fundamental Theorem of Arithmetic. The theorem states that every number can be broken down in to a product of prime numbers. For instance:

If you choose two **prime** numbers to multiply together, the product can only be broken down into two numbers, like the number . However, if you choose **any** two large numbers, you are not guaranteed this quality, like in the number . There are many factors that divide , but only two that will divide . In other words, the chances of finding factors for are much higher than the chances for finding the factors of . The odds diminish significantly the larger and larger you make the prime numbers – that’s why RSA keys are thousands of digits long.

Two relatively simple examples:

1.

An easier problem:

2.

How would you begin factoring numbers like or ? The first thing to come to mind might be trying to divide or by all of the numbers before them until you find a number that evenly divides out, kind of like:

For , see that once you only take a few steps to reach . Instantly, you see . Success! This method will certainly find factors for both and , but it will take much longer for you to find the factors of than simply because there is such a large difference in magnitude. If you tell the computer to do these steps with number on the order of the same magnitude as , you would be waiting a really long time for the results. There are research labs that are trying to work on new ways to factor numbers that are much, much larger than ( is thirty digits long. Actual RSA keys are *thousands* of digits long). There is a cleverer solution. A research paper by Eric Bach (1991) explores the Pollard’s Rho method for factoring prime numbers. It is a method grounded in probability theory, and draws on the implications of the Birthday Paradox.

# The birthday paradox

Essentially what the Birthday Paradox asks is: How many people do you need to talk to before the probability of two people sharing the same birthday (month and day) is 50%? If you ask 367 people, the probability is certainly 100%. If you ask two people, the probability is 0.27%.

Click here for a demonstration of the results.

As it turns out, you only need to ask ~23 people before you have a 50-50 chance of a collision of 2 people born on the same day.

#### The take-away point of the Birthday Paradox is that you need a much smaller sample size (23 people) out of a much larger range of possibilities (367 people) to increase your odds of a collision to as high as 50%.

Pollard’s rho works on this principle. Of the range , you only need to sample numbers to find the factors that you’re looking for. Of course, when is very, very large, the algorithm doesn’t work as fast. But for numbers that are about 30 digits long, Pollard’s rho outperforms trial division enormously.

If you’re truly curious, this is a neat video that explains how to solve the problem. It’s not necessary to watch this to understand Pollard’s rho.

# Pollard’s rho

The Pollard’s rho algorithm is a surprisingly simple algorithm for factoring numbers. Some people may consider it outdated by today’s standards of cutting-edge number factoring algorithms, but it outperforms trial division (walking through every number from 1 until ) by a significant order of magnitude. John M. Pollard came up with this algorithm in 1975 and Richard Brent added a few modifications to it in 1980. This is the basic scheme:

- Generate random numbers based on the magnitude of the number you would like to factor. The “random walk” is not actually random because we are going to use a function to generate these numbers, however it will
*look*and*feel*random.- , where is a computer generated random number. is recursive, which means that whatever you get from the first iteration of , you take that result and you use it as your new input for the function. Another way to see it:

- We can pick numbers and instead of asking if any one of those random numbers divides , we can ask if the greatest common denominator of and the difference between two random numbers is greater than 1. That is: . Asking for a non-trivial factor in common increases the number of chances for successes. Thus, if , is either or .
- The reason why we have more chances of success when looking for is because when you ask how many numbers divide , your answer is two: . When you ask how many multiples of share factors with , you have precisely numbers because:

- The reason why we have more chances of success when looking for is because when you ask how many numbers divide , your answer is two: . When you ask how many multiples of share factors with , you have precisely numbers because:

# An example

You can implement Pollard’s rho in any programming language you please. It is fortuitous that Java has a BigInteger package that can handle numbers with more than 20 digits, so it seems the most natural to me to code this up in Java. This is just a snippet of pseudocode (Pseudocode is not actually code that would run on a computer. Its purpose is to illustrate what steps should happen at which point of the algorithm for clarification). The phrase “while” means “while the statement is true, repeat this operation.”

```
set x = 2
set y = 2
while ( y does not equal x )
find f(x)
find f(f(y))
if GCD( | y - x | , n) > 1
p = GCD( | y - x | , n)
stop looping
return p
```

Notice that we calculate . This is the change that Floyd made to Pollard’s algorithm. When you run the loop without this extra calculation, sometimes you can get stuck in an infinitely looping disaster like below:

To prevent this from happening, Floyd said something like, “Let’s pretend there are two racers on a circular track. They will start at the same time and place, but runner B will run twice as fast as runner A. When runner B passes runner A (the stopping condition for the loop, y = x), we know that we are caught in a repeating cycle. Tell the computer to stop looping. Restart the runners with a new random seed and try again.” Floyd’s cycle-detection scheme saves us from worrying about repeating values.

# Be glad computers can’t factor

The people who work in cryptography and cybersecurity are notoriously paranoid. When they get wind of the slightest hint of anyone coming up with a way to factor numbers that are hundreds of digits long, they instantly get to work to create bigger and bigger keys. Their worry is rightly justified, though. If someone were to come up with a way to quickly factor numbers, we’d be cooked. Say good-bye to online shopping, online banking, online bill pay, online taxes, direct deposit,… pretty much online-anything.