Itertools to speed up nested loops in beautiful soup

Question

This is the code written in Python 3 and it works. But running it with four nested loops and more inside it's super slow.

How can I implement itertools to speed up the loops a bit?

For 25 rows with 4 columns of data this takes around 20 seconds.

import bs4 as bs
import urllib.request
import time

start_time = time.time()

a=[]
b=[]
c=[]
d=[]

for z in range(1,10):
    source = urllib.request.urlopen(f'https://X.com/id={z}').read()
    soup = bs.BeautifulSoup(source,'html.parser')

    for i in range(0,50):
        for name in soup.find_all('span',id=f"tblRightHolders:{i}:cellRHSurnameName"):
            a.insert(i,name.string)
        for city in soup.find_all('span',id=f"tblRightHolders:{i}:cellRHPlace"):
            b.insert(i,city.string)
        for street in soup.find_all('span', id=f"tblRightHolders:{i}:cellRHStreet"):
            c.insert(i,street.string)
        for number in soup.find_all('span', id=f"tblRightHolders:{i}:cellRHNumber"):
            d.insert(i,number.string)
    
X = [list(e) for e in zip(a, b, c, d)]
for nested in X:
    print(" - ".join(map(str, nested)))

print("--- %s seconds ---" % (time.time() - start_time))

The data get's output like this:

Name/Surname - City - Street - Street number

Answer 1

I do not think that itertools will speed it up - they can just provide nicer approach for more readable code. If you want to speed it up, there are several options:

Use joblib for parallelization
Try using some just-in-time compiler such as numba but for that you would have to probably rewrite the code as the soup code won't most likely be complient with numba
rewrite the critical code in C/C++, rust or cython

Those last two are most likely overkills. Go with simple parallelism using joblib if you can (ie, have multiple cores available). Itertools won't help you with speeding it up, they can just make your code nicer.

[edit] I do recommend timing your code first. If your code spends most of the time with downloading the pages, you can just go with joblib but using threads instead of processes. I was just today doing something similar with 100 separate threads for webpage requests.

Itertools to speed up nested loops in beautiful soup

Question

1 answers

solution1
1 2020-08-28 14:38:58

Itertools to speed up nested loops in beautiful soup

Question

1 answers

solution1 1 2020-08-28 14:38:58

solution1
1 2020-08-28 14:38:58