简体   繁体   中英

How can I input the list of words in a loop that can be added to the url to get the results

I am giving the url as input: url = "https://www.amazon.in/s?k=headphones&page=1" This works fine but stops at page 19 Instead of we breaking at page 19, I want to give the next input as "https://www.amazon.in/s?k=" +

  1. "speakers&page=1"
  2. "earbuds&page=1" and so on to run in a loop
from bs4 import BeautifulSoup as soup
import pandas as pd
import requests

data =[]

def getdata (url):
    header = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' } 
    req = urllib.request.Request(url, headers=header)
    amazon_html = urllib.request.urlopen(req).read()
    a_soup = soup(amazon_html,'html.parser')
    
    for e in a_soup.select('div[data-component-type="s-search-result"]'):
        try:
            title = e.find('h2').text
        except:
            title = None
            
        data.append({
            'title':title
        })
        
    return a_soup

def getnextpage(a_soup):
  page= a_soup.find('a',attrs={"class": 's-pagination-item s-pagination-next s-pagination-button s-pagination-separator'})
  page = page['href']
  url =  'http://www.amazon.in'+ str(page)
  return url
            
while True:
  geturl = getdata(url)
  url = getnextpage(geturl)
    
  if not url:
    break
  print(url)```


```output = pd.DataFrame(data)
output

This code is returning the correct results, but instead of me giving a new url every time I want it to input a list of items which can be added at the end of the url one at a time to fetch the results which can be added to the DataFrame Note: The search results stop at 19th page

Make a list for your keywords, iterate it and include the while loop into each iteration.

keywords = ['speakers','earbuds']

for k in keywords:
    url = 'https://www.amazon.in/s?k='+k
    while True:
        geturl = getdata(url)
        url = getnextpage(geturl)

        if not url:
            break
        print(url)

Be aware that amazon does not like such automated access to its pages and recognises the patterns of access quite quickly. To reduce the frequency of the requests a bit, you should at least include some delay time.sleep() . Of course, it would be even better to use an official api.

Example
from bs4 import BeautifulSoup as soup
import pandas as pd
import requests
import urllib

data =[]

def getdata (url):
    header = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' } 
    req = urllib.request.Request(url, headers=header)
    amazon_html = urllib.request.urlopen(req).read()
    a_soup = soup(amazon_html,'html.parser')
    
    for e in a_soup.select('div[data-component-type="s-search-result"]'):
        try:
            title = e.find('h2').text
        except:
            title = None
            
        data.append({
            'title':title,
            'url':'http://www.amazon.in' + e.h2.a['href']
        })
        
    return a_soup

def getnextpage(a_soup):
    try:
        page = a_soup.find('a',attrs={"class": 's-pagination-item s-pagination-next s-pagination-button s-pagination-separator'})['href']
        url =  'http://www.amazon.in'+ str(page)
    except:
        url = None
    return url


keywords = ['speakers','earbuds']

for k in keywords:
    url = 'https://www.amazon.in/s?k='+k
    while True:
        geturl = getdata(url)
        url = getnextpage(geturl)

        if not url:
            break
        print(url)

Output (print)

http://www.amazon.in/s?k=speakers&page=2&qid=1649420352&ref=sr_pg_1
...
http://www.amazon.in/s?k=speakers&page=20&qid=1649420373&ref=sr_pg_19
http://www.amazon.in/s?k=earbuds&page=2&qid=1649420375&ref=sr_pg_1
...
http://www.amazon.in/s?k=earbuds&page=20&qid=1649420394&ref=sr_pg_19

Output ( pd.DataFrame(data) )

title url
0 Echo Dot (3rd Gen) - #1 smart speaker brand in India with Alexa (Black) http://www.amazon.in/gp/bestsellers/electronics/15765862031/ref=sr_bs_0_15765862031_1
1 TimbreSonic Rhythm Speaker Wired Karaoke Ultimate Sound Party Portable Speaker http://www.amazon.in/gp/slredirect/picassoRedirect.html/ref=pa_sp_atf_aps_sr_pg1_1?ie=UTF8&adId=A01688993VZM1IH2U6JB5&url=%2FTimbreSonic-Speaker-Karaoke-Ultimate-Portable%2Fdp%2FB096M2T346%2Fref%3Dsr_1_2_sspa%3Fkeywords%3Dspeakers%26qid%3D1649421227%26sr%3D8-2-spons%26psc%3D1%26smid%3DAK0P65LCJ5QQN&qualifier=1649421227&id=2899208110237385&widgetName=sp_atf
2 boAt Stone 180 5W Bluetooth Speaker with Upto 10 Hours Playback, 1.75" Driver, IPX7 and TWS Feature(Black) http://www.amazon.in/boAt-Stone-Bluetooth-Speaker-Black/dp/B08JMC1988/ref=ice_ac_b_dpb?keywords=speakers&qid=1649421227&sr=8-3
3 Speaker http://www.amazon.in/Generic-Speaker/dp/B09X5M77MZ/ref=sr_1_omk_4?keywords=speakers&qid=1649421227&sr=8-4
4 Zebronics Zeb-Warrior 2.0 Multimedia Speaker with Aux Connectivity,USB Powered and Volume Control http://www.amazon.in/gp/bestsellers/computers/1375442031/ref=sr_bs_4_1375442031_1
... ... ...
847 Zebronics Zeb-Sound Bomb 5 TWS Earbuds with Bluetooth v5.0, up to 22H Backup, Flash Connect, Splash Proof, Voice Assistant, Touch Control, 10mm Driver, Built in Microphone and Type C(Black) http://www.amazon.in/gp/slredirect/picassoRedirect.html/ref=pa_sp_mtf_aps_sr_pg20_1?ie=UTF8&adId=A09061362IHFGLF39FZ4K&url=%2FZebronics-Zeb-Sound-Bluetooth-Assistant-Microphone%2Fdp%2FB09NNNLBVD%2Fref%3Dsr_1_308_sspa%3Fkeywords%3Dearbuds%26qid%3D1649420939%26sr%3D8-308-spons%26psc%3D1&qualifier=1649420939&id=2014190349292195&widgetName=sp_mtf
848 boAt Airdopes 141 True Wireless Earbuds with 42H Playtime, Beast Mode(Low Latency Upto 80ms) for Gaming, ENx Tech, ASAP Charge, IWP, IPX4 Water Resistance, Smooth Touch Controls(Bold Black) http://www.amazon.in/gp/slredirect/picassoRedirect.html/ref=pa_sp_mtf_aps_sr_pg20_1?ie=UTF8&adId=A08646093S9SKZXE3VDX4&url=%2FboAt-Airdopes-141-Wireless-Resistance%2Fdp%2FB09N3ZNHTY%2Fref%3Dsr_1_309_sspa%3Fkeywords%3Dearbuds%26qid%3D1649420939%26sr%3D8-309-spons%26psc%3D1&qualifier=1649420939&id=2014190349292195&widgetName=sp_mtf
849 Skyfly Xbot GE100 Wired in Ear Earphones with Mic (Black) http://www.amazon.in/Skyfly-Xbot-Gaming-Earphones-Detachable/dp/B07ZYR78B3/ref=sr_1_310?keywords=earbuds&qid=1649420939&sr=8-310
850 JBL C115 TWS, True Wireless Earbuds with Mic, Jumbo 21 Hours Playtime with Quick Charge, True Bass, Dual Connect, Bluetooth 5.0, Type C & Voice Assistant Support for Mobile Phones (Black) http://www.amazon.in/gp/slredirect/picassoRedirect.html/ref=pa_sp_btf_aps_sr_pg20_1?ie=UTF8&adId=A0791293Y8WP49FN4EZU&url=%2FJBL-Wireless-Bluetooth-Assistance-Integration%2Fdp%2FB08L5ZC8R3%2Fref%3Dsr_1_311_sspa%3Fkeywords%3Dearbuds%26qid%3D1649420939%26smid%3DA14CZOWI0VEHLG%26sr%3D8-311-spons%26psc%3D1&qualifier=1649420939&id=2014190349292195&widgetName=sp_btf
851 Crossbeats Airpop Bluetooth Truly Wireless In Ear Earbuds With Mic, with 30Hrs Playtime Ultralight Bluetooth Earphone with Mic & Voice Assistant, Passive Noise Cancelling Headset, Type-C Fasting Charging - Blue http://www.amazon.in/gp/slredirect/picassoRedirect.html/ref=pa_sp_btf_aps_sr_pg20_1?ie=UTF8&adId=A10368023R9B7RAUU82SP&url=%2FCrossbeats-Bluetooth-Ultralight-Assistant-Cancelling%2Fdp%2FB09PDSVQTW%2Fref%3Dsr_1_312_sspa%3Fkeywords%3Dearbuds%26qid%3D1649420939%26sr%3D8-312-spons%26psc%3D1&qualifier=1649420939&id=2014190349292195&widgetName=sp_btf

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM