简体   繁体   中英

Retrieving data from web

I have this code:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
driver_BV= webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver_BV.get("https://www.qvc.com/handbags-and-luggage/handbags/clutches/_/N-1cknw/c.html?qq=mh")
product_BV=[]
price_BV=[]
elementHTML=driver_BV.find_element("class name", 'productInfoWrapper')
Final=[]
children_element=elementHTML.find_elements("class name", 'plContent')

print('''
      a. Retrieve data
      b. Create the graph
      c. Display the matrix
      d. Save to Excel file
      e. Exit
      ''')

while True:
    select_option_BV = input("Select option:")
    if select_option_BV == 'a':
        for child_element in children_element:
            title=child_element.find_element("class name", 'productDesc').get_attribute('innerText')
            product_BV.append(title)
            titlu=child_element.find_element("class name", 'priceSell')
            price=titlu.get_attribute('innerText')
            price_BV.append(price)
            print ('Products:', product_BV)
            print ('Prices:', price_BV)
            price=price.replace("$","")
            Final.append(float(price))
            Product_title_series=pd.Series(product_BV)
            Product_price_series=pd.Series(Final)
            product_rows={"Product name":Product_title_series, "Price":Product_price_series}
            Product_Matrix_Framework=pd.DataFrame(product_rows)
    elif select_option_BV == 'b':
        Product_Matrix_Framework.plot(x="Product name",y="Price")
    elif select_option_BV == 'c':
        print(Product_Matrix_Framework.sort_values("Price"))
    elif select_option_BV == 'd':
        Product_Matrix_Framework.to_excel("Products.xlsx")
    elif select_option_BV == 'e':
        print("CY@ exiting...")
        break

And I don't know what mistake I did but I can't make it work! I need it for a project for my university but I'm stuck on it right now, I don't know what I did wrong, when I'm writing "a" in console doesn't do anything and if I write any other letter says: "name 'Product_Matrix_Framework' is not defined" please help! Thank you.

I think the elementHTML isn't necessary. You just use the children_element by searching for css selector.

...
driver_BV.get("https://www.qvc.com/handbags-and-luggage/handbags/clutches/_/N-1cknw/c.html?qq=mh")
product_BV=[]
price_BV=[]
Final=[]
children_element=driver_BV.find_elements_by_css_selector(".plContent .galleryItem")

It will find 60 items of goods. After that, the error keeps occurring and I think you need to fix those errors in for loop.

After looking at some documentation sites and the website itself (I'm assuming you want the elements with CSS selector productDesc ), I think I see what you want to do.

If you want to select an element by its CSS selector, ( productDesc in this instance is a CSS selector) you should use:

title_elements = child_element.find_elements_by_css_selector("productDesc")

Which should return an array containing all the child elements with the CSS selector productDesc , and you can then iterate through that array to get the text of each element. Something like:

titles = []
for title_element in title_elements:
    titles.append(title_elements.get_attribute("innerHtml")

Looking at the website, each child_element may have one or more elements with the productDesc CSS selector, so you should store these in an array in case there are more than one. Your code appears to assume that there is only one.

For example:


...
while True:
    select_option_BV = input("Select option:")
    if select_option_BV == 'a':
        for child_element in children_element:
            titles = []
            for title_element in child_element.find_elements_by_css_selector("productDesc"):
                titles.append(title_element.get_attribute("innerText"))

            product_BV.append(titles)  # Product_BV will now be an array containing more arrays
            ...

@Nathcat

from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
driver_RA= webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver_RA.get("https://www.qvc.com/handbags-and-luggage/handbags/clutches/_/N-1cknw/c.html?qq=mh")
product_RA=[]
price_RA=[]
elementHTML=driver_RA.find_element("class name", 'plContent')
Final=[]
children_element=elementHTML.find_elements("class name", 'productInfoWrapper')

print('''
      a. Retrieve data
      b. Create the graph
      c. Display the matrix
      d. Save to Excel file
      e. Exit
      ''')

while True:
    select_option_RA = input("Select option:")
    if select_option_RA == 'a':
        for child_element in children_element:
            title=child_element.find_element("class name", 'productDesc').get_attribute('innerText')
            #Trying to print every SECOND string from productDesc (because for some reason every
            #first innerText from productDesc is empty 
            product_RA.append(title)
            titlu=child_element.find_element("class name", 'priceSell')
            price=titlu.get_attribute('innerText')
            price_RA.append(price)
            print ('Products:', product_RA)
            print ('Prices:', price_RA)
            price=price.replace("€","")
            Final.append(price)
            Product_title_series=pd.Series(product_RA)
            Product_price_series=pd.Series(Final)
            product_rows={"Product name":Product_title_series, "Price":Product_price_series}
            Product_Matrix_Framework=pd.DataFrame(product_rows)
    elif select_option_RA == 'b':
        Product_Matrix_Framework.plot(x="Product name",y="Price")
    elif select_option_RA == 'c':
        print(Product_Matrix_Framework.sort_values("Price"))
    elif select_option_RA == 'd':
        Product_Matrix_Framework.to_excel("Products.xlsx")
    elif select_option_RA == 'e':
        print("Exiting beep boop beep.")
        break```

I did this, but I can't get the products, if I run it and type "a", it shows the prices but not the products

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM