The fundamental theorem of arithmetic states that ever positive integer can be written uniquely as a product of primes

example:

65340=22335112

Trial division

Trial division is the simplest algorithm for factoring an integer, we assume that s and t are factors of a number n such that n=st and st (note that s and t do not need to be prime numbers), when a divisor s is found then n/s is also a factor

vector<int> trial_division(int n) {
  for (int i = 2; i * i <= n; i += 1) {
    if (n % i == 0) {
      // `n` is a composite number
      return vector<int> {i, n / i};
    }
  }
  // n is a prime number
  return vector<int> {n};
}

Fermat factorization

Fermat’s observation was to write an integer as the difference of squares

(1)n=x2y2(2)=(x+y)(xy)

Assuming that s and t are odd factors of n such that n=st and st we can find x and y such that

s=xyt=x+y

Adding both equations

s+t=2xx=s+t2

Also

y=ts2

Since we assumed that s and t are odd numbers, their difference is an even number which is divisible by 2 therefore x and y are integers, since s>1 and ts we find that x1 and y0

From (1) we also know that x=n+y2 and hence xn, also x=s+t2 and we know that the upper bound of s happens when s is as close as t as possible, given that st, xt+t2n

Implementation notes: since s and t are odd numbers, their product n is also an odd number, therefore the implementation below works with odd values of n

/**
 * Factorization of an odd number `n` based on Fermat's
 * factorization algorithm
 *
 * @param  {int} n
 * @return {vector<int>} a vector with two odd integers if `n` is not a
 * prime number, a single integer if `n` is a prime number
 */
vector<int> fermat_factorization(int n) {
  for (int x = (int) ceil(sqrt(n)); x <= n; x += 1) {
    int ySquared = x * x - n;
    // check if `y` is the square of some number
    int y = (int) sqrt(ySquared);
    if (y * y == ySquared) {
      int s = x - y;
      int t = x + y;
      // `s` must be > 1
      if (s != 1 && t != n) {
        return vector<int> {s, t};
      }
    }
  }
  // n is a prime number
  return vector<int> {n};
}

Pollard’s Rho factorization

Pollard’s Rho factorization is a probabilistic factorization algorithm based on the assumption that a number n is a composite number and the following facts:

  • since n is a composite number there must be a factor d
  • let a, b two positive integers, if ab(modd) then the difference ab is a multiple of d, since n is also a multiple of d some multiple of d is a divisor of n and ab, particularly gcd(ab,n) is a divisor of n, let gcd(ab,n)>1 then we have found two factors of n (gcd(ab,n),ngcd(ab,n))

Now the problem is reduced to find a and b such that gcd(ab,n)>1, we can use the following algorithm which picks random numbers in the range [1,n1]

let `n` be the number to be factorized
let `x` be an array of integers

x[0] = random integer in the range [1, n - 1]
while we haven't two numbers such that `gcd(x_i, x_j, n) > 1`
  x[i] = random integer in the range [1, n - 1]
  for all `j < i` and `j >= 0`
    if `gcd(x[i] - x[j], n) > 1`
      return x[i], x[j]

A simulation can find a and b with a probability  50 after n iterations, the algorithm above is not very helpful though since at the k iteration we have to do k1 pairwise checks

Here’s another algorithm to pick random numbers, let x be an integer in the range [1,n1], a function that will generate a number in the range [1,n1] based on a previous number is

f(x)=x2+c(modn)

Because there are only n1 possible values our generator will eventually fall into a cycle, for example let n=55,c=2,x=2

x0=2x1=(22+2)(mod55)=6x2=(62+2)(mod55)=38x3=(382+2)(mod55)=16x4=(162+2)(mod55)=38 which is equal to x2 

Pollard detected the cycle using Floyd’s cycle-finding algorithm which is based on two pointers which move through a sequence at different speeds, one moves a unit and the other moves two units each time, if there’s a cycle eventually the two pointers will encounter at some element belonging to the cycle, if we’ve analyzed all the elements of the sequence and saw not a single contiguous pair fulfills gcd(xixi+1,n)>1 we need to choose other values for x0,a and rerun the algorithm

// C++11
#include <random>

/**
 * Computes a factor of `n` which is greater than `1`
 * @param {long long} n The number to be factorized
 * @return {long long} A positive integer which is a factor of `n`
 * when the algorithm successfully finds a factor of `n`, -1 otherwise
 */
long long pollard_rho(long long n) {
  if (n % 2 == 0) {
    return 2;
  }

  std::random_device rd;
  std::mt19937 engine(rd());
  std::uniform_int_distribution<long long> dis(1, n - 1);

  long long x = dis(engine);
  long long c = dis(engine);
  long long y = x;   // y = x^2 + c (mod n)
  do {
    // tortoise goes x = f(x)
    x = ((x * x) % n + c) % n;

    // hare goes y = f(f(y))
    y = ((y * y) % n + c) % n;
    y = ((y * y) % n + c) % n;

    long long gcd = __gcd(abs(x - y), n);
    if (gcd > 1) {
      return gcd;
    }
  } while (x != y);

  return -1;
}

/**
 * Pollard rho factorization runner, it makes multiple calls to
 * `pollard_rho` until a factor is found
 * @param {long long} n The number to be factorized
 * @return {long long} A factor of `n` (it is `n` when `n` is a prime number)
 */
long long pollard_rho_factorization(long long n) {
  long long factor;
  do {
    factor = pollard_rho(n);
  } while(factor < 0);
  return factor;
}

Eratosthenes Sieve factorization of a range

We can also compute the factorization of a number by modifying the sieve of Erathostenes, remember that each state of the sieve hold a boolean telling if the number is prime or not, this time each state of the sieve will hold a pair of numbers

  • the lowest prime that is a divisor of any index i
  • the maximum power of the lowest prime computed above (optional)

Let’s represent n as p1a1p2a2pnan, since we’re hold for each position the lowest prime and its the maximum power, the state stored at the position n of the sieve will be p1a1, if we divide n by this number we will move to the state p2a2pnan, this recursive process is run until the current state reached in the sieve is 1

// let `p` be the smallest prime factor of the index `i`, each element
// contains a pair `(p, x)` such that `p^x` is a divisor of `i`
// e.g.
//
//    8 = (2, 3)
//    15 = (3, 1)
//    6 = (2, 1)
//
vector<pair<int, int> > lp;

void eratosthenes_sieve_factorization(long long n) {
  pair<int, int> unvisited(-1, -1);

  // (-1, -1) is an unvisited state
  lp.assign(n + 1, unvisited);

  for (int i = 2; i * i <= n; i += 1) {
    if (lp[i] == unvisited) {
      // if an index is in an unvisited state it's a prime number
      pair<int, int> base(i, 1);
      lp[i] = base;
      for (int j = i * i; j <= n; j += i) {
        if (lp[j] == unvisited) {
          // if a multiple of the prime number is in an unvisited
          // state that means that it's lowest prime divisor is
          // the current index `i`
          lp[j] = base;
          if (lp[j / i] != unvisited) {
            // `j` is a multiple of `i`, in fact `j = i^x` because
            // only numbers which don't have other factors but `i`
            // reach this point, to accumulate the powers it's enough
            // find out the power of `j / i`
            lp[j].second += lp[j / i].second;
          }
        }
      }
    }
  }

  // all the prime numbers > sqrt(n) will have an unvisited state
  // changing the unvisited state to prime_number^1
  int sqrtN = sqrt(n);
  if (sqrtN % 2 == 0) {
    sqrtN += 1;
  }
  for (int i = sqrtN; i <= n; i += 2) {
    if (lp[i] == unvisited) {
      lp[i] = pair<int, int>(i, 1);
    }
  }
}