從網絡檢索數據

Question

我有這個代碼：

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
driver_BV= webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver_BV.get("https://www.qvc.com/handbags-and-luggage/handbags/clutches/_/N-1cknw/c.html?qq=mh")
product_BV=[]
price_BV=[]
elementHTML=driver_BV.find_element("class name", 'productInfoWrapper')
Final=[]
children_element=elementHTML.find_elements("class name", 'plContent')

print('''
      a. Retrieve data
      b. Create the graph
      c. Display the matrix
      d. Save to Excel file
      e. Exit
      ''')

while True:
    select_option_BV = input("Select option:")
    if select_option_BV == 'a':
        for child_element in children_element:
            title=child_element.find_element("class name", 'productDesc').get_attribute('innerText')
            product_BV.append(title)
            titlu=child_element.find_element("class name", 'priceSell')
            price=titlu.get_attribute('innerText')
            price_BV.append(price)
            print ('Products:', product_BV)
            print ('Prices:', price_BV)
            price=price.replace("$","")
            Final.append(float(price))
            Product_title_series=pd.Series(product_BV)
            Product_price_series=pd.Series(Final)
            product_rows={"Product name":Product_title_series, "Price":Product_price_series}
            Product_Matrix_Framework=pd.DataFrame(product_rows)
    elif select_option_BV == 'b':
        Product_Matrix_Framework.plot(x="Product name",y="Price")
    elif select_option_BV == 'c':
        print(Product_Matrix_Framework.sort_values("Price"))
    elif select_option_BV == 'd':
        Product_Matrix_Framework.to_excel("Products.xlsx")
    elif select_option_BV == 'e':
        print("CY@ exiting...")
        break

而且我不知道我犯了什么錯誤，但我無法讓它發揮作用！ 我的大學項目需要它，但我現在堅持下去，我不知道我做錯了什么，當我在控制台中寫“a”時不會做任何事情，如果我寫任何其他信中說：“名稱'Product_Matrix_Framework'未定義”請幫忙！ 謝謝你。

Answer 1

我認為elementHTML不是必需的。 您只需通過搜索 css 選擇器來使用children_element 。

...
driver_BV.get("https://www.qvc.com/handbags-and-luggage/handbags/clutches/_/N-1cknw/c.html?qq=mh")
product_BV=[]
price_BV=[]
Final=[]
children_element=driver_BV.find_elements_by_css_selector(".plContent .galleryItem")

它將找到 60 件商品。 之后，錯誤不斷發生，我認為您需要在for循環中修復這些錯誤。

Answer 2

在查看了一些文檔站點和網站本身之后（我假設您想要使用 CSS 選擇器productDesc的元素），我想我明白了您想要做什么。

如果你想通過它的 CSS 選擇器選擇一個元素，（ productDesc在這個例子中是一個 CSS 選擇器）你應該使用：

title_elements = child_element.find_elements_by_css_selector("productDesc")

它應該返回一個包含所有帶有 CSS 選擇器productDesc的子元素的數組，然后您可以遍歷該數組以獲取每個元素的文本。 就像是：

titles = []
for title_element in title_elements:
    titles.append(title_elements.get_attribute("innerHtml")

查看網站，每個child_element可能有一個或多個帶有productDesc CSS選擇器的元素，因此您應該將它們存儲在一個數組中以防多個元素。 您的代碼似乎假設只有一個。

例如：


...
while True:
    select_option_BV = input("Select option:")
    if select_option_BV == 'a':
        for child_element in children_element:
            titles = []
            for title_element in child_element.find_elements_by_css_selector("productDesc"):
                titles.append(title_element.get_attribute("innerText"))

            product_BV.append(titles)  # Product_BV will now be an array containing more arrays
            ...

Answer 3

@Nathcat

from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
driver_RA= webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver_RA.get("https://www.qvc.com/handbags-and-luggage/handbags/clutches/_/N-1cknw/c.html?qq=mh")
product_RA=[]
price_RA=[]
elementHTML=driver_RA.find_element("class name", 'plContent')
Final=[]
children_element=elementHTML.find_elements("class name", 'productInfoWrapper')

print('''
      a. Retrieve data
      b. Create the graph
      c. Display the matrix
      d. Save to Excel file
      e. Exit
      ''')

while True:
    select_option_RA = input("Select option:")
    if select_option_RA == 'a':
        for child_element in children_element:
            title=child_element.find_element("class name", 'productDesc').get_attribute('innerText')
            #Trying to print every SECOND string from productDesc (because for some reason every
            #first innerText from productDesc is empty 
            product_RA.append(title)
            titlu=child_element.find_element("class name", 'priceSell')
            price=titlu.get_attribute('innerText')
            price_RA.append(price)
            print ('Products:', product_RA)
            print ('Prices:', price_RA)
            price=price.replace("€","")
            Final.append(price)
            Product_title_series=pd.Series(product_RA)
            Product_price_series=pd.Series(Final)
            product_rows={"Product name":Product_title_series, "Price":Product_price_series}
            Product_Matrix_Framework=pd.DataFrame(product_rows)
    elif select_option_RA == 'b':
        Product_Matrix_Framework.plot(x="Product name",y="Price")
    elif select_option_RA == 'c':
        print(Product_Matrix_Framework.sort_values("Price"))
    elif select_option_RA == 'd':
        Product_Matrix_Framework.to_excel("Products.xlsx")
    elif select_option_RA == 'e':
        print("Exiting beep boop beep.")
        break```

I did this, but I can't get the products, if I run it and type "a", it shows the prices but not the products

從網絡檢索數據

問題描述

3 個解決方案

解決方案1
1 2022-05-23 17:58:40

解決方案2
1 2022-05-23 18:04:15

解決方案3
0 2022-05-24 16:50:37

從網絡檢索數據

問題描述

3 個解決方案

解決方案1 1 2022-05-23 17:58:40

解決方案2 1 2022-05-23 18:04:15

解決方案3 0 2022-05-24 16:50:37

解決方案1
1 2022-05-23 17:58:40

解決方案2
1 2022-05-23 18:04:15

解決方案3
0 2022-05-24 16:50:37