简体   繁体   中英

How to make web-scraping faster? Django project

I'm building a web-scraping application using the Django framework. I need some tips on how to speed up my application. As of right now, it takes almost a minute to load the page just parsing through 3 urls which is a problem. I'm going to need to run a lot faster as I want to parse through up to 10 urls on my webpage. As you can see, I'm only targeting one div with my code which is why my application is running so slowly. I'm thinking I could try targeting multiple divs to narrow down my "soup" but I've had difficulty with that in the past so I'm hoping to get some pointers.

def stats(request):
    if 'user_id' not in request.session:
        return redirect('/')
    this_user = User.objects.filter(id = request.session['user_id'])
    this_stock = Stock.objects.filter(user_id = request.session['user_id'])
    progress_dict = []
    for object in this_stock:
        URL = object.nasdaq_url
        page = requests.get(URL)
        soup = BeautifulSoup(page.content, 'html.parser')
        progress = soup.find_all('div', class_='ln0Gqe')
        for number in progress:
            progress_dict.append(number.text)
    context = {
            "current_user" : this_user[0].first_name,
            "progress_dict": progress_dict,
            "this_stock": this_stock,
        }
    return render(request, "nasdaq.html", context)

You can use threading for scraping multiple page simultaneously. Look here , here for more information about it.

And also using lxml can speed up your webscraping. You can check here for more information.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM