简体   繁体   中英

python beautifulsoup if-in-statement doesn´t work correctly

I have a problem with the web scraping code below. The code works, but if the entered product is not just a single word and contains for example also a number like "Playstation 4" it fails. The problem seems to be in this line if product in str(product_name):

I tried many different variations like product_name.text or product_name.string , but it won´t correctly check if the string product is in the converted object product_name if it is not just one word.

If I use print(product_name.text) I get exactly the result that I would expect, but why can´t I use the if-in-statement correctly with product_name.text or str(product_name) ?

import requests
from bs4 import BeautifulSoup

product = input("Please enter product: ")

URL = "http://www.somewebsite.com/search?sSearch=" + product

website = requests.get(URL)

html = BeautifulSoup(website.text, 'html.parser')

product_info = html.find_all('div', class_="product--main")


product_array = []
for product_details in product_info:
    product_name = product_details.find('a', class_="product--title product--title-desktop")
    if product in str(product_name):
        product_array.append(product_name.text.replace('\n', '')+'; ')
        discounted_price = product_details.find('span', class_="price--default is--discount")
        if discounted_price:
            product_array.append(discounted_price.text.replace('\n', '').replace('\xa0€*','').replace('from','') + ';\n')
        else:
            regular_price = product_details.find('span', class_="price--default")
            product_array.append(regular_price.text.replace('\n', '').replace('\xa0€*','').replace('from','') + ';\n' if regular_price else 'N/A;\n')

with open("data.csv", "w") as text_file:
    text_file.write("product; price;\n")
    for object in product_array:
        text_file.write(object)

Why should I use urlencode?

I tried many different variations like product_name.text or product_name.string, but it won´t correctly check if the string product is in the converted object product_name if it is...

not just one word .

URL = "http://www.somewebsite.com/search?sSearch=" + product

Please look what happens with query string when you use concatenation: 在此处输入图像描述


So please consider updating your code like below: 在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM