简体   繁体   中英

I have to open multiple sites one by one in python

I am trying to build a scraper for amazon for multiple pages. I have created this code. I attached the output as well. I have to run through all of them but when I try to open the links it only opens the last one.

import pandas as pd
import ssl
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE     
extracted_data = []
#read the excel file
df = pd.read_excel(r'C:\Users\adelinpa\Desktop\Adelina_2\apluscontent_yes_no.xlsx')
print(df)
#convert asin column to a list
        
asin = df['asin'].tolist()
print(asin)
        
#take user input to establish MP:
marketplace = str(input('Insert MP :'))
link = []
        
        
        
if marketplace == "IT":
    for i in asin:
        length = len(asin)
        i = 0
        while i < length:
            link = "www.amazon.it/dp/" +asin[i] + '/'
            i+=1
            print(link)
        
elif marketplace == "ES":
    for i in asin:
        length = len(asin)
        i = 0
        while i < length:
            link = "www.amazon.es/dp/" + asin[i] + '/'
            i+=1
            print(link)
        
elif marketplace == "MX":
    for i in asin:
        length = len(asin)
        i = 0
        while i < length:
            link = "www.amazon.com.mx/" + asin[i] + '/'
            i += 1
            print(link)
        
elif marketplace =="BR":
    for i in asin:
        length = len(asin)
        i = 0
        while i < length:
            link = "www.amazon.com.br/dp/" + asin[i] + '/'
            i += 1
            print(link)           

Output:

www.amazon.es/dp/B07N1JPGPD/
www.amazon.es/dp/B0758LBVDC/
www.amazon.es/dp/B07MJ7GCKJ/
www.amazon.es/dp/B07N1JX25F/
www.amazon.es/dp/B07B91VB7B/
www.amazon.es/dp/B07MSKJ35L/
www.amazon.es/dp/B07M798C5Z/
www.amazon.es/dp/B07N1J7TVC/
www.amazon.es/dp/B07FR1CSWR/
www.amazon.es/dp/B07MC132XS/
www.amazon.es/dp/B07FSHLZ9H/
www.amazon.es/dp/B07M7985YJ/
www.amazon.es/dp/B07NLVRH41/
www.amazon.es/dp/B07MSKL5G5/
www.amazon.es/dp/B07B94NH1S/

If I understand you correctly, the list link should contain all links?

So far, you are not adding the links to the list, but overwrite your list variable with a String-value in each iteration. You need to use append instead of overwriting your variable.

You are also iterating over asin multiple times by having two loops within each other (the second loop is the same as the first, but executes again once for every item). You can reduce it if you use the for-loop as shown below.

import pandas as pd
import ssl
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE     
extracted_data = []
#read the excel file
df = pd.read_excel(r'C:\Users\adelinpa\Desktop\Adelina_2\apluscontent_yes_no.xlsx')
print(df)
#convert asin column to a list

asin = df['asin'].tolist()
print(asin)

#take user input to establish MP:
marketplace = str(input('Insert MP :'))
link = []
 

if marketplace == "IT":
    for element in asin:
        link.append("www.amazon.it/dp/" + element + '/')

elif marketplace == "ES":
    for element in asin:
        link.append("www.amazon.es/dp/" + element + '/')

elif marketplace == "MX":
    for element in asin:
        link.append("www.amazon.com.mx/dp/" + element + '/')

elif marketplace =="BR":
    for element in asin:
        link.append("www.amazon.com.br/dp/" + element + '/')

A cleaner (and a lot shorter) version could be:

...
#take user input to establish MP:
marketplace = str(input('Insert MP :')).lower()

# extend marketplace adress if necessary
if marketplace in ['mx', 'br']:
  marketplace = 'com.' + marketplace

links = []
for element in asin:
   links.append("www.amazon." + marketplace  + "/dp/" + element + '/')

Thank you all for answering my question! After this concatenation between amazon website and asin, i got stucl on opening the links that I just created.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM