简体   繁体   English

我必须在python中一个一个打开多个站点

[英]I have to open multiple sites one by one in python

I am trying to build a scraper for amazon for multiple pages.我正在尝试为多个页面为亚马逊构建一个刮刀。 I have created this code.我已经创建了这个代码。 I attached the output as well.我也附上了输出。 I have to run through all of them but when I try to open the links it only opens the last one.我必须遍历所有这些,但是当我尝试打开链接时,它只会打开最后一个。

import pandas as pd
import ssl
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE     
extracted_data = []
#read the excel file
df = pd.read_excel(r'C:\Users\adelinpa\Desktop\Adelina_2\apluscontent_yes_no.xlsx')
print(df)
#convert asin column to a list
        
asin = df['asin'].tolist()
print(asin)
        
#take user input to establish MP:
marketplace = str(input('Insert MP :'))
link = []
        
        
        
if marketplace == "IT":
    for i in asin:
        length = len(asin)
        i = 0
        while i < length:
            link = "www.amazon.it/dp/" +asin[i] + '/'
            i+=1
            print(link)
        
elif marketplace == "ES":
    for i in asin:
        length = len(asin)
        i = 0
        while i < length:
            link = "www.amazon.es/dp/" + asin[i] + '/'
            i+=1
            print(link)
        
elif marketplace == "MX":
    for i in asin:
        length = len(asin)
        i = 0
        while i < length:
            link = "www.amazon.com.mx/" + asin[i] + '/'
            i += 1
            print(link)
        
elif marketplace =="BR":
    for i in asin:
        length = len(asin)
        i = 0
        while i < length:
            link = "www.amazon.com.br/dp/" + asin[i] + '/'
            i += 1
            print(link)           

Output:输出:

www.amazon.es/dp/B07N1JPGPD/
www.amazon.es/dp/B0758LBVDC/
www.amazon.es/dp/B07MJ7GCKJ/
www.amazon.es/dp/B07N1JX25F/
www.amazon.es/dp/B07B91VB7B/
www.amazon.es/dp/B07MSKJ35L/
www.amazon.es/dp/B07M798C5Z/
www.amazon.es/dp/B07N1J7TVC/
www.amazon.es/dp/B07FR1CSWR/
www.amazon.es/dp/B07MC132XS/
www.amazon.es/dp/B07FSHLZ9H/
www.amazon.es/dp/B07M7985YJ/
www.amazon.es/dp/B07NLVRH41/
www.amazon.es/dp/B07MSKL5G5/
www.amazon.es/dp/B07B94NH1S/

If I understand you correctly, the list link should contain all links?如果我理解正确,列表link应该包含所有链接吗?

So far, you are not adding the links to the list, but overwrite your list variable with a String-value in each iteration.到目前为止,您没有将链接添加到列表中,而是在每次迭代中用字符串值覆盖您的列表变量。 You need to use append instead of overwriting your variable.您需要使用append而不是覆盖您的变量。

You are also iterating over asin multiple times by having two loops within each other (the second loop is the same as the first, but executes again once for every item).您还通过在彼此内部有两个循环来多次迭代asin (第二个循环与第一个循环相同,但对每个项目再次执行一次)。 You can reduce it if you use the for-loop as shown below.如果使用如下所示的 for 循环,则可以减少它。

import pandas as pd
import ssl
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE     
extracted_data = []
#read the excel file
df = pd.read_excel(r'C:\Users\adelinpa\Desktop\Adelina_2\apluscontent_yes_no.xlsx')
print(df)
#convert asin column to a list

asin = df['asin'].tolist()
print(asin)

#take user input to establish MP:
marketplace = str(input('Insert MP :'))
link = []
 

if marketplace == "IT":
    for element in asin:
        link.append("www.amazon.it/dp/" + element + '/')

elif marketplace == "ES":
    for element in asin:
        link.append("www.amazon.es/dp/" + element + '/')

elif marketplace == "MX":
    for element in asin:
        link.append("www.amazon.com.mx/dp/" + element + '/')

elif marketplace =="BR":
    for element in asin:
        link.append("www.amazon.com.br/dp/" + element + '/')

A cleaner (and a lot shorter) version could be:一个更干净(而且更短)的版本可能是:

...
#take user input to establish MP:
marketplace = str(input('Insert MP :')).lower()

# extend marketplace adress if necessary
if marketplace in ['mx', 'br']:
  marketplace = 'com.' + marketplace

links = []
for element in asin:
   links.append("www.amazon." + marketplace  + "/dp/" + element + '/')

Thank you all for answering my question!谢谢大家回答我的问题! After this concatenation between amazon website and asin, i got stucl on opening the links that I just created.在亚马逊网站和 asin 之间进行这种连接之后,我在打开我刚刚创建的链接时得到了 stucl。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM