I'm working on a personal project at the moment and ran into some trouble.
I'm using Beautiful Soup to scrape some user replies off a web page. I'd like to specifically scrape the number of downvotes and upvotes on their post but I haven't been able to successfully do so.
Below is the HTML that contains the number of upvotes for a user's post. Each user has a different name
element ID as shown with the 171119643
so I have been confused as to how I can scrape for all the name
elements.
<strong id="cmt_o_cnt_171119643" name="cmt_o_cnt_171119643">756</strong>
I did notice each name starts with the same string: cmt_o_cnt_
. Is there a way I can scrape for elements starting with that string using the code below?
for url in soup.find_all('strong', name_=''):
A non-regex solution would be to check if the substring "cmt_o_cnt_"
is in tag['name']
:
for tag in soup.find_all('strong'):
if "cmt_o_cnt_" in tag['name']:
print(tag['name']) # or do your stuff
By using CSS Selectors, you can scrape the name elements that you want.
from bs4 import BeautifulSoup
html = '''
<strong id="cmt_o_cnt_171119643" name="cmt_o_cnt_171119643">756</strong>
<strong id="cmt_o_cnt_171119644" name="cmt_o_cnt_171119644">256</strong>
<strong id="cmt_o_cnt_171119645" name="cmt_o_cnt_171119645">123</strong>
'''
soup = BeautifulSoup(html,"lxml")
for tag in soup.select('strong[name*="cmt_o_cnt_"]'):
print(tag['name'])
You can check some usages of css selectors here
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.