简体   繁体   中英

How can I find all the 'name' elements that begin with a specific string?

I'm working on a personal project at the moment and ran into some trouble.

I'm using Beautiful Soup to scrape some user replies off a web page. I'd like to specifically scrape the number of downvotes and upvotes on their post but I haven't been able to successfully do so.

Below is the HTML that contains the number of upvotes for a user's post. Each user has a different name element ID as shown with the 171119643 so I have been confused as to how I can scrape for all the name elements.

<strong id="cmt_o_cnt_171119643" name="cmt_o_cnt_171119643">756</strong>

I did notice each name starts with the same string: cmt_o_cnt_ . Is there a way I can scrape for elements starting with that string using the code below?

for url in soup.find_all('strong', name_=''):

A non-regex solution would be to check if the substring "cmt_o_cnt_" is in tag['name'] :

for tag in soup.find_all('strong'):
    if "cmt_o_cnt_" in tag['name']:
        print(tag['name'])  # or do your stuff

By using CSS Selectors, you can scrape the name elements that you want.

from bs4 import BeautifulSoup
html = '''
  <strong id="cmt_o_cnt_171119643" name="cmt_o_cnt_171119643">756</strong>
  <strong id="cmt_o_cnt_171119644" name="cmt_o_cnt_171119644">256</strong>
  <strong id="cmt_o_cnt_171119645" name="cmt_o_cnt_171119645">123</strong>
'''
soup = BeautifulSoup(html,"lxml")
for tag in soup.select('strong[name*="cmt_o_cnt_"]'):
  print(tag['name'])

You can check some usages of css selectors here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM