Finding prime numbers with Eratosthenes

A prime is a natural number greater than 1 that has no positive divisors other than 1 and itself. For example:

7 is prime because it has only 2 divisors: 1 and itself.
6 is not prime because it has many divisors: 1, 2, 3, and 6.

Here's an interesting problem: write a program that finds all prime numbers between 0 and N. For N = 20, the solution would be: 2, 3, 5, 7, 11, 13, 17, and 19.

In this article we will cover 2 solutions. One naive, and one called the "sieve of Eratosthenes" that I find elegant. Both will be written in Python.

The naive solution

The best place to start when confronted with a problem is with a naive solution. It means we will code a basic program with no performance considerations.

First, we write a function to check whether or not the parameter x is prime.

def is_prime(x):
  # If x is less than 2, it is not prime by definition
  if x < 2:
    return False

  # Go though all the numbers between 2 and x (x excluded)
  for i in range(2, x):
    # If the remainder of x divided by i is zero
    if x % i == 0:
      # Then i is a divisor of x, so x is not prime
      return False

  # If we found no divisor, then x is prime
  return True

Second, we go through all numbers between 0 and N one by one, and call is_prime() to test if each number is prime or not.

def all_primes(n):
  # List that will contain all prime numbers
  primes = []

  # Go though all numbers between 0 and n (n included)
  for i in range(0, n+1):
    # If the number i is prime, then add it to the list
    if is_prime(i):
      primes.append(i)

  # Return all prime numbers
  return primes

And we are done! Let's run our program a few times while measuring how long it takes.

all_primes(10000) # 0.2ms
all_primes(100000) # 9ms
all_primes(1000000) # 870ms
all_primes(10000000) # 115,000ms

It's pretty slow for large values of N. That's because we have 2 nested for loops that both need to go through many iterations. The time complexity of this algorithm is O(N2), which is not great.

The sieve of Eratosthenes algorithm

Eratosthenes is a Greek mathematician that was born in 276 BC. He discovered a clever algorithm named after him that will solve our problem in an elegant and efficient way.

Below is a description of how the algorithm works for N = 29.

First we list all numbers between 2 and 29. Each number can be in one of 2 states: green or red. By default they are all green.

We start from the beginning of the list, and pick the first green number. In our case that's 2.

Then we go through the remainder of the list to find all numbers that are multiples of 2, and we set them in red. They are: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, and 28.

Now we start again: find the first green number (3) and mark in red all its multiples (6, 9, 12, 15, 18, 21, 24, 27). Note that some of them were already red, so their color won't change.

And we repeat the same thing: green (5) and red (10, 15, 20, 25).

If we continue this process for all numbers, we will end up with this.

As you may have guessed, all numbers in green are primes. Pretty cool!

Here's a short explanation of why this works:

By construction, red numbers are multiples of at least one green numbers. It means that red numbers cannot be primes.
Green numbers were never set in red. It means they have only 2 divisors (1 and themselves), so they must be primes.

The sieve of Eratosthenes code

Here's the full code of the algorithm, with many comments to explain how it works.

Note that to store numbers and their corresponding colors, we use a list of booleans. For example [ False, False, True ] would mean:

False in 0th position → 0 is red (not prime).
False in 1st position → 1 is red (not prime).
True in 2nd position → 2 is green (prime).

def all_primes(n):
  # List that will contain all the prime numbers
  primes = []

  # Initialize our table with n times the boolean True
  table = [ True ] * n

  # The numbers 0 and 1 are not primes by definition
  # So we set them as False in the table
  table[0] = table[1] = False

  # Go though all the elements of the table
  for i, boolean in enumerate(table):
    # If the boolean is set to True
    if boolean:
      # Add i it to our list of primes
      primes.append(i)

      # Go through all multiples of i
      # That's i+i, i+i+i, and so on up to n
      for j in range(i+i, n, i):
        # And set their flags to False
        table[j] = False

  # Return all primes
  return primes

The time complexity of this algorithm is O(N log log N), which is much better than the previous O(N2). So if we try our new program with the same values, we get vastly improved performances.

all_primes(10000) # 0.001ms (previously 0.2ms)
all_primes(100000) # 0.4ms (previously 9ms)
all_primes(1000000) # 3.5ms (previously 870ms)
all_primes(10000000) # 36.5ms (previously 115,000ms)

Conclusion

Prime numbers are an interesting topic, and the sieve of Erastosthenes is a very efficient and elegant solution to find them. That's why it's one of my favorite algorithms.

For your information, all the code shown in this article is available on GitHub.