简体   繁体   English

从网络检索数据

[英]Retrieving data from web

I have this code:我有这个代码:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
driver_BV= webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver_BV.get("https://www.qvc.com/handbags-and-luggage/handbags/clutches/_/N-1cknw/c.html?qq=mh")
product_BV=[]
price_BV=[]
elementHTML=driver_BV.find_element("class name", 'productInfoWrapper')
Final=[]
children_element=elementHTML.find_elements("class name", 'plContent')

print('''
      a. Retrieve data
      b. Create the graph
      c. Display the matrix
      d. Save to Excel file
      e. Exit
      ''')

while True:
    select_option_BV = input("Select option:")
    if select_option_BV == 'a':
        for child_element in children_element:
            title=child_element.find_element("class name", 'productDesc').get_attribute('innerText')
            product_BV.append(title)
            titlu=child_element.find_element("class name", 'priceSell')
            price=titlu.get_attribute('innerText')
            price_BV.append(price)
            print ('Products:', product_BV)
            print ('Prices:', price_BV)
            price=price.replace("$","")
            Final.append(float(price))
            Product_title_series=pd.Series(product_BV)
            Product_price_series=pd.Series(Final)
            product_rows={"Product name":Product_title_series, "Price":Product_price_series}
            Product_Matrix_Framework=pd.DataFrame(product_rows)
    elif select_option_BV == 'b':
        Product_Matrix_Framework.plot(x="Product name",y="Price")
    elif select_option_BV == 'c':
        print(Product_Matrix_Framework.sort_values("Price"))
    elif select_option_BV == 'd':
        Product_Matrix_Framework.to_excel("Products.xlsx")
    elif select_option_BV == 'e':
        print("CY@ exiting...")
        break

And I don't know what mistake I did but I can't make it work!而且我不知道我犯了什么错误,但我无法让它发挥作用! I need it for a project for my university but I'm stuck on it right now, I don't know what I did wrong, when I'm writing "a" in console doesn't do anything and if I write any other letter says: "name 'Product_Matrix_Framework' is not defined" please help!我的大学项目需要它,但我现在坚持下去,我不知道我做错了什么,当我在控制台中写“a”时不会做任何事情,如果我写任何其他信中说:“名称'Product_Matrix_Framework'未定义”请帮忙! Thank you.谢谢你。

I think the elementHTML isn't necessary.我认为elementHTML不是必需的。 You just use the children_element by searching for css selector.您只需通过搜索 css 选择器来使用children_element

...
driver_BV.get("https://www.qvc.com/handbags-and-luggage/handbags/clutches/_/N-1cknw/c.html?qq=mh")
product_BV=[]
price_BV=[]
Final=[]
children_element=driver_BV.find_elements_by_css_selector(".plContent .galleryItem")

It will find 60 items of goods.它将找到 60 件商品。 After that, the error keeps occurring and I think you need to fix those errors in for loop.之后,错误不断发生,我认为您需要在for循环中修复这些错误。

After looking at some documentation sites and the website itself (I'm assuming you want the elements with CSS selector productDesc ), I think I see what you want to do.在查看了一些文档站点和网站本身之后(我假设您想要使用 CSS 选择器productDesc的元素),我想我明白了您想要做什么。

If you want to select an element by its CSS selector, ( productDesc in this instance is a CSS selector) you should use:如果你想通过它的 CSS 选择器选择一个元素,( productDesc在这个例子中是一个 CSS 选择器)你应该使用:

title_elements = child_element.find_elements_by_css_selector("productDesc")

Which should return an array containing all the child elements with the CSS selector productDesc , and you can then iterate through that array to get the text of each element.它应该返回一个包含所有带有 CSS 选择器productDesc的子元素的数组,然后您可以遍历该数组以获取每个元素的文本。 Something like:就像是:

titles = []
for title_element in title_elements:
    titles.append(title_elements.get_attribute("innerHtml")

Looking at the website, each child_element may have one or more elements with the productDesc CSS selector, so you should store these in an array in case there are more than one.查看网站,每个child_element可能有一个或多个带有productDesc CSS选择器的元素,因此您应该将它们存储在一个数组中以防多个元素。 Your code appears to assume that there is only one.您的代码似乎假设只有一个。

For example:例如:


...
while True:
    select_option_BV = input("Select option:")
    if select_option_BV == 'a':
        for child_element in children_element:
            titles = []
            for title_element in child_element.find_elements_by_css_selector("productDesc"):
                titles.append(title_element.get_attribute("innerText"))

            product_BV.append(titles)  # Product_BV will now be an array containing more arrays
            ...

@Nathcat @Nathcat

from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
driver_RA= webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver_RA.get("https://www.qvc.com/handbags-and-luggage/handbags/clutches/_/N-1cknw/c.html?qq=mh")
product_RA=[]
price_RA=[]
elementHTML=driver_RA.find_element("class name", 'plContent')
Final=[]
children_element=elementHTML.find_elements("class name", 'productInfoWrapper')

print('''
      a. Retrieve data
      b. Create the graph
      c. Display the matrix
      d. Save to Excel file
      e. Exit
      ''')

while True:
    select_option_RA = input("Select option:")
    if select_option_RA == 'a':
        for child_element in children_element:
            title=child_element.find_element("class name", 'productDesc').get_attribute('innerText')
            #Trying to print every SECOND string from productDesc (because for some reason every
            #first innerText from productDesc is empty 
            product_RA.append(title)
            titlu=child_element.find_element("class name", 'priceSell')
            price=titlu.get_attribute('innerText')
            price_RA.append(price)
            print ('Products:', product_RA)
            print ('Prices:', price_RA)
            price=price.replace("€","")
            Final.append(price)
            Product_title_series=pd.Series(product_RA)
            Product_price_series=pd.Series(Final)
            product_rows={"Product name":Product_title_series, "Price":Product_price_series}
            Product_Matrix_Framework=pd.DataFrame(product_rows)
    elif select_option_RA == 'b':
        Product_Matrix_Framework.plot(x="Product name",y="Price")
    elif select_option_RA == 'c':
        print(Product_Matrix_Framework.sort_values("Price"))
    elif select_option_RA == 'd':
        Product_Matrix_Framework.to_excel("Products.xlsx")
    elif select_option_RA == 'e':
        print("Exiting beep boop beep.")
        break```

I did this, but I can't get the products, if I run it and type "a", it shows the prices but not the products

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM