简体   繁体   中英

How to get python to gert certain values with webscraping?

I am currently working on a way to get video game prices more easily and was working with this webscraper and got it to work on pricecharting.com but doesn't want to get the values the GitHub repo I am using is https://github.com/LonsterMonster/Pricecharting-Scraper The report searches pricecharting.com for the game name and gets the attributes and prices

The Error I am getting is

Everything works but the Genre and the rest after getting called to NA and doesn't 
recognize that the HTML tags I called are exact ones for the parts I need

the code where the error is at is below

for EachPart in soup.select('div[id*="game-page"]'):
        try:
            title = re.findall(r'>(.*?)</a>', str(EachPart.select('h1[id="product_name"]'))).group()
        except AttributeError:
            title = re.findall(r'>(.*?)</a>', str(EachPart.select('h1[id="product_name"]')))
        if title:
            print(title)
        loosePrice = re.findall("\d+\.\d+", str(EachPart.select('td[id="used_price"] > span[class="price js-price"]')))
        loosePrice = loosePrice[0] if len(loosePrice) > 0 else "N/A"
        completePrice = re.findall("\d+\.\d+", str(EachPart.select('td[id="complete_price"] > span[class="price js-price"]')))
        completePrice = completePrice[0] if len(completePrice) > 0 else "N/A"
        newPrice = re.findall("\d+\.\d+", str(EachPart.select('td[id="new_price"] > span[class="price js-price"]')))
        newPrice = newPrice[0] if len(newPrice) > 0 else "N/A"
        
        Genre = re.findall("\d+\.\d+", str(EachPart.select('tr > td[itemprop="genre"]')))
        Genre = Genre[0] if len(Genre) > 0 else "N/A"
        ReleaseDate = re.findall("\d+\.\d+", str(EachPart.select('tr > td[itemprop="datePublished"]')))
        ReleaseDate = ReleaseDate[0] if len(ReleaseDate) > 0 else "N/A"
        ESRBRating = re.findall("\d+\.\d+", str(EachPart.select('tr > td[itemprop="contentRating"]')))
        ESRBRating = ESRBRating[0] if len(ESRBRating) > 0 else "N/A"
        Publisher = re.findall("\d+\.\d+", str(EachPart.select('tr > td[itemprop="publisher"]')))
        Publisher = Publisher[0] if len(Publisher) > 0 else "N/A"
        Developer = re.findall("\d+\.\d+", str(EachPart.select('tr > td[itemprop="author"]')))
        Developer = Developer[0] if len(Developer) > 0 else "N/A"
        ModelNumber = re.findall("\d+\.\d+", str(EachPart.select('tr > td[itemprop="model-number"]')))
        ModelNumber = ModelNumber[0] if len(ModelNumber) > 0 else "N/A"
        
        UPC = re.findall("\d+\.\d+", str(EachPart.select('tr[itemprop="identifier"] > td[itemprop="value"]')))
        UPC = UPC[0] if len(UPC) > 0 else "N/A"
        Description = re.findall("\d+\.\d+", str(EachPart.select('tr > td[itemprop="description"]')))
        Description = Description[0] if len(Description) > 0 else "N/A"

I am kind of new to python but understand what I am doing the repo mentioned above I the one I have been editing just got to a point I got stuck Thanks

Edit#1 edited the code to be more correct with how I have it

Edit Error#1 Everything works but the Genre and the rest after getting called to NA and doesn't recognize that the HTML tags I called are exact ones for the parts I need

You're calling group on the result of EachPart.select(...) , which returns a list so you can either loop through that list , or take the first element (if you're sure there is only one)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM