简体   繁体   中英

Speeding up summation in Python

I have a large list with numbers n1 in listn1 . I want to add all multiples of each number, if the multiplicand is not in another list listn2 (prime numbers with specific characteristics) and the product is below a maximum max_n . The multiplicand is in most cases below 10, but can go up to 100,000. My code so far looks like:

s = 0
for n1 in listn1:
    s += sum(n1 * i for i in range(1, 1 + max_n // n1) if i not in listn2)

The problem: this approach is sloooow. It takes seconds to calculate listn1 and listn2 , so I know that there are more than a million numbers to add. But I started the summation yesterday evening and it is still running this morning.

Is there a Pythonic way to speed up this summation?

I have 2 suggestions for you.

First of all , you don't have multiply i with n1 at each iteration. You can replace

s += sum(n1 * i for i in range(1, 1 + max_n // n1) if i not in listn2)

with

s += n1 * sum(i for i in range(1, 1 + max_n // n1) if i not in listn2)

They are totally same.

Secondly , without if i not in listn2 condition, you have a simple summation:

sum(i for i in range(1, 1 + max_n // n1)

This is same with sum([1, 2, 3, 4, 5, 6, 7, 8, ..., (max_n // n1)]) , and equals (max_n // n1) * (1 + max_n // n1) / 2 . For a simply example, take a look at this .

To handle if i not in listn2 condition, if your listn2 is smaller, you can sum listn2 instead of listn1 .

So , find the sum of listn1 and subtract the items in listn2 :

def sum_until(l, max):
    return sum([x for x in l if x < max])

listn2 = list(set(listn2))

for n1 in listn1:
    finish = max_n // n1
    s += n1 * (finish * (finish + 1) / 2 - sum_until(listn2, finish)) 

EDIT:

I guess NumPy would be faster for sum. Make listn2 a numpy array:

import numpy as np

listn2 = np.array(list(set(listn2))) 

And use this sum_until function:

def sum_until(listn2, max):
    l = listn2[np.where(listn2 <= max)]
    return int(np.sum(l))

Taking the suggestions on board, I re-wrote the code to

setn2 = set(listn2)
s = 0
for n1 in listn1:
    s += n1 * (max_n * (max_n + 1) // 2 - sum(i for i in range(1, 1 + max_n // n1) if i in setn2))

Instead of hours, the summation now takes seconds.

Several hours later

Coming across this old question of mine, it turned out that retrospectively, I did the right thing. The idea mentioned here in this thread to use numpy.sum() instead was well intended but wrong, as shown here . If you work in numpy, fine, but if you have a Python list, use comprehensions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM