简体   繁体   中英

Extract Google Search Engine Result

I would like to scrap/extract the number of results given by Google Search Engine for a given word with Python, but I can't with Beautifulsoup and Request library. If anyone can help me out that would be great.

A print screen is attached to better explain the number I want to extract.

Google Search Engine Result 'decoration'

If you inspect the search page, you can see that the value is inside a div with an id. 检查结果

That is good for your purpose since ids uniquely identify elements within the page. To find out how to get an element by its id you just need to make a google search ( first result ), and then you can get the text from the "text" property. You will also need to parse the text, to only extract the number.

Edit: Looks like, without providing a user agent, the google API will not return the full page. If you send the "User-Agent" header with the value from your browser, it should work. A quick way to check yourself is to run the request in Postman, where you can just search if the result has what you need.

html = BeautifulSoup(requests.get(
            'https://www.google.com/search?q=f', headers={ "user-agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36'}).text, features="html.parser")

text = html.body.find('div', attrs={'id': 'result-stats'}).text

print(text)

Also, it's worth mentioning that Google provides endpoints exactly for this kind of purpose. Here's another question referring to the same problem ( link ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM