简体   繁体   中英

Beautiful Soup not finding CSS selector

I am using Beautiful Soup and Requests to try to scrape data from a website, and am having difficulties pulling data with a certain CSS selector. I am using SelectorGadget ( https://selectorgadget.com/ ) to identify the selector I'm looking for at this site: https://www.oddsshark.com/ncaab/odds .op-bovada.\\lv is returned by it. However, this does not work, and neither does escaping the backslash. I've tried multiple variants of this, along with hunting around online, and haven't had any luck. I'm a bit of a python beginner, so I have a hunch I'm overlooking something obvious.

This code reproduces the issue that I'm running into.

import requests, bs4
res = requests.get('https://www.oddsshark.com/ncaab/odds')
odds = bs4.BeautifulSoup(res.text, 'html.parser')

# This, another identifier from the same site, works fine.
print(str(len(odds.select('.op-opening'))))

# However, this does not.
print(str(len(odds.select('.op-bovada.\lv'))))
print(str(len(odds.select('.op-bovada.\\lv'))))

I've had no problems doing this in R - it just needed double backslashes, so I know there's data there, but I'm beating my head against a wall in Python at the moment.

Why not use beautiful soups find method?

print(len(odds.find_all(class_='op-bovada.lv')))

The problem is arising from the way that select parses the string being passed in. the . identifies a class, however in this case the . is part of the class so the parser does not interpret it correctly. By passing the . into the class argument of find_all instead you get the desired effect.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM