简体   繁体   中英

Beautiful Soup (Python) not seeing text inside of span

I can't figure out why BS4 is not seeing the text inside of the span in the following scenario:

My code:

stars = soup.find('span', {'class': 'github-repo-info__item', 'data-key': 'stargazers_count'}).text

also tried:

stars = soup.find('span', {'class': 'github-repo-info__item', 'data-key': 'stargazers_count'}).get_text()

Both return an empty string '' . The element itself seems to be located correctly (I can browse through parents / siblings in PyCharm debugger without a problem. Fetching text in other parts of the website also works perfectly fine. It's just the github-related stats that fail to fetch.

Any ideas?

Because this page use Javascript to load the page dynamically.So you couldn't get it directly by response.text

The source code of the page: 在此处输入图像描述

You could crawl the API directly:

import requests

r = requests.get('https://api.github.com/repos/psf/requests')
print(r.json()["stargazers_count"])

Result:

43010

Using bs4, we can't scrape stars rate.

After inspecting the site, please check response html. There, there is class information named "github-repo-info__item", but there is no text information.

in this case, use selenium .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM