简体   繁体   中英

Efficient Prime Factorization for large numbers

I've been working on a little problem where I need to compute 18-digit numbers into their respective prime factorization. Everything compiles and it runs just fine, considering that it actually works, but I am looking to reduce the run time of the prime factorization. I have implemented recursion and threading but I think I might need some help in understanding possible algorithms for large number computation.

Every time I run this on the 4 numbers I have pre-made, it takes about 10 seconds. I would like to reduce this to possibly 0.06 seconds if there are any ideas out there.

I noticed a few algorithms like Sieve of Eratosthenes and producing a list of all the prime numbers prior to computing. I'm just wondering if someone could elaborate on it. For instance, I'm having issues understanding how to implement Sieve of Eratosthenes into my program or if it would even be a good idea. Any and all pointers on how to approach this better would be really helpful!

Here is my code:

#include <iostream>
#include <thread>
#include <vector>
#include <chrono>

using namespace std;
using namespace std::chrono;

vector<thread> threads;
vector<long long> inputVector;
bool developer = false; 
vector<unsigned long long> factor_base;
vector<long long> primeVector;

class PrimeNumber
{
    long long initValue;        // the number being prime factored
    vector<long long> factors;  // all of the factor values
public:
    void setInitValue(long long n)
    {
        initValue = n;
    }
    void addToVector(long long m)
    {
        factors.push_back(m);
    }
    void setVector(vector<long long> m)
    {
        factors = m;
    }
    long long getInitValue()
    {
        return initValue;
    }
    vector<long long> getVector()
    {
        return factors;
    }
};

vector<PrimeNumber> primes;

// find primes recursively and have them returned in vectors
vector<long long> getPrimes(long long n, vector<long long> vec)
{
    double sqrt_of_n = sqrt(n);

    for (int i = 2; i <= sqrt_of_n; i++)
    {
        if (n % i == 0) 
        {
            return vec.push_back(i), getPrimes(n / i, vec); //cause recursion
        }
    }

    // pick up the last prime factorization number
    vec.push_back(n);

    //return the finished vector
    return vec;
}

void getUserInput()
{
    long long input = -1;
    cout << "Enter all of the numbers to find their prime factors. Enter 0 to compute" << endl;
    do
    {
        cin >> input;
        if (input == 0)
        {
            break;
        }
        inputVector.push_back(input);
    } while (input != 0);
}

int main() 
{

    vector<long long> temp1;   // empty vector
    vector<long long> result1; // temp vector

    if (developer == false)
    {
        getUserInput();
    }
    else
    {
        cout << "developer mode active" << endl;
        long long a1 = 771895004973090566;
        long long b1 = 788380500764597944;
        long long a2 = 100020000004324000;
        long long b2 = 200023423420000000;
        inputVector.push_back(a1);
        inputVector.push_back(b2);
        inputVector.push_back(b1);
        inputVector.push_back(a2);
    }

    high_resolution_clock::time_point time1 = high_resolution_clock::now();

    // give each thread a number to comput within the recursive function
    for (int i = 0; i < inputVector.size(); i++)
    {   
        PrimeNumber prime;
        prime.setInitValue(inputVector.at(i));
        threads.push_back(thread([&]{
            prime.setVector(result1 = getPrimes(inputVector.at(i), temp1));
            primes.push_back(prime);
        }));
    }

    // allow all of the threads to join back together.
    for (auto& th : threads)
    {
        cout << th.get_id() << endl;
        th.join();
    }

    high_resolution_clock::time_point time2 = high_resolution_clock::now();

    // print all of the information
    for (int i = 0; i < primes.size(); i++)
    {
        vector<long long> temp = primes.at(i).getVector();

        for (int m = 0; m < temp.size(); m++)
        {
            cout << temp.at(m) << " ";
        }
        cout << endl;
    }

    cout << endl;

    // so the running time
    auto duration = duration_cast<microseconds>(time2 - time1).count();

    cout << "Duration: " << (duration / 1000000.0) << endl;

    return 0;
}

Trial division is only suitable for factoring small numbers. For n up to 2^64, you'll need a better algorithm: I recommend starting with wheel factorization to get the small factors, followed by Pollard's rho algorithm to get the rest. Where trial division is O(sqrt(n)), rho is O(sqrt(sqrt(n))), so it's much faster. For 2^64, sqrt(n) = 2^32, but sqrt(sqrt(n)) = 2^16, which is a huge improvement. You should expect to factor your numbers in a few milliseconds, at most.

I don't have C++ code for factoring, but I do have readable Python code. Let me know if you want me to post it. If you want to know more about wheel factorization and the rho algorithm, I have lots of prime number stuff at my blog .

for(int i  = 2; i * i <= n; ++i) //no sqrt, please
{
    while(n%i == 0) //while, not if
    {
         factors.push_back(i);
         n/=i;
    }
}
if(n != 1)
{
    factors.push_back(n);
}

This is basically a neater implementation of your algorithm. Its complexity is sqrt of N. It will work pretty quickly even for a 18-digit number, but only if the prime factors are all small. If it's a product of two large prime numbers, or worse, is prime itself, this will run for approximately 10 seconds.

I found that the Sieve of Eratosthenes on your modern processor thrashes the cache, so that main memory bandwidth is the limiting factor. I found this when trying to run multiple threads and failing to speed things up by as much as I was hoping for.

So, I recommend breaking the sieve into segments which will fit in the L3 cache. Also, if you exclude multiples of 2, 3 and 5 from the bit vector, then an 8 bit byte can represent 30 numbers on the number line, with 1 bit for each number which is 1, 7, 11, 13, 17, 19, 23 or 29 modulo 30 -- so that a bit map for primes up to 10^9 takes ~32MB -- 10^9 / (30 * 1024 * 1024). This is almost half the size of a bit map which just excludes multiples of 2, which is ~60MB -- 10^9 / (2 * 8 * 1024 * 1024).

Obviously, to run the sieve up to 10^9 you need the primes up to sqrt(10^9) -- which requires some 1,055 bytes, from which you can generate any part of the full sieve up to 10^9.

FWIW, the results I get on a modest AMD Phenom II x6 1090T (8MB L3 cache), for primes up to 10^9 are:

  1. 1 core,   1 segment    3.260 seconds elapsed
  2. 5 cores,  1 segment    1.830 seconds elapsed
  3. 1 core,   8 segments   1.800 seconds elapsed
  4. 5 cores, 40 segments   0.370 seconds elapsed

where by "segment" I mean a part of the sieve. In this case the sieve is ~32MB, so where there are multiple segments they are using about 4MB of L3 cache at any one time.

Those times include the time required to scan the completed sieve and generate all the primes as an array of integers. That takes about 0.5 secs of CPU ! So, to run the sieve without actually extracting the primes from it, takes 0.270 seconds elapsed in case (4) above.

FWIW, I get a small improvement -- to 0.240 seconds in case (4) -- by initialising each segment using a precalculated pattern that removes multiples of 7, 11, 13 and 17. That pattern is 17,017 bytes.

Clearly, to do a single factorization in 0.06 secs... you need the sieve to be pre-computed !

A simple speedup of two can easily be achieved by changing your loop:

if (n % 2) {
    return vec.push_back(i), getPrimes(n / i, vec);
}

for (int i = 3; i <= sqrt_of_n; i += 2)
{
    if (n % i == 0) 
    {
        return vec.push_back(i), getPrimes(n / i, vec); //cause recursion
    }
}

You first should test the number by two. Then, starting from 3 you test again incrementing your loop by two at a time. You already know thay 4, 6, 8, ... are even numbers and have 2 as a factor. Testing against even numbers you're reducing your complexity by half.

To factor a number N you only need the prime numbers <= sqrt(N). For a 18 digit number you only need to test against all primes less than 1e9 , and since there are 98 millon primes less than 2e9 you can easily store 100 millon numbers on today's computers and run the factoring in parallel. If each number takes 8 bytes of RAM ( int64_t ), 100 millon primes would take 800 MB of memory. This algorithm is the classic solution to SPOJ problem #2, Prime Generator .

The best way to list all the small primes that can fit on a 32-bit int is to build a Sieve of Eratostenes. I told you that we need the primes less than sqrt(N) to factor any N, so to factor 64 bit integers you need all the primes that fit as a 32-bit number.

Algorithm :

  1. Divide number by 2 until it is not divisible by it ,store the result and display it .
  2. Divide number by 3 until it is not divisible by 3 and display the result,
  3. Repeat same process for 5,7... etc till square root of n.
  4. If at the end resultant number is prime , display its count as 1.

     int main() { long long n; cin >> n; int count =0 ; while(!(n%2)){ n = n / 2; count++; } if(count > 0) { cout<<"2^"<<count<<" "; } for(long long i=3;i<=sqrt(n) ; i+=2){ count=0; while(n%i == 0){ count++; n = n/i; } if(count){ cout << i <<"^" <<count<<" "; } } if(n>2){ cout<<n <<"^1"; } return 0; } 

Input : 100000000 Output 2^8 5^8

Algorithm for factoring integers, so simple, that it may be realized even on an abacus.

void start()
{ 
   int a=4252361;    //  integer to factorize
   int b=1,  c,  d=a-1;  

   while ((a > b) && (d != b))
  {
    if (d > b)
     {
       c=c+b;
       a=a-1;
       d=a-c-1;
     }
    if (d < b)
     {
       c=b-d;
       b=b+1;
       a=a-1;
       d=a-c-1;
     }         
  }
    if ((d == b)&&(a > b)) 
  Alert ("a = ",  a-1,  " * ", b+1); 

    if ((d < b)&&(a <= b)) 
  Alert ("a  is a prime");

  return;
}

The algorithm is composed by me, Miliaev Viktor Konstantinovich, born 26, July, 1950. It is written in MQL4 15.12. 2017. E-mail: tenfacet27@gmail.com

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM