简体   繁体   中英

Opening top google search results in Python

I am trying to open the top 5 search results in google. But my code is not opening the top results. Instead, it is opening 5 tabs with google, google web results, google images, google news, and google books. My code is below,

import requests, sys, webbrowser, bs4

res = requests.get('https://google.com/search?q=' + ' '.join(sys.argv[1:]))
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, 'html.parser')
linkElems = soup.select(r'a')
numOpen = min(5, len(linkElems))
for i in range(numOpen):
    webbrowser.open('https://google.com' + linkElems[i].get('href'))

Please help. I want the code to open the top five search results and not images or books.

As mentioned select your elements more specific but try to avoid using dynamic class names, instead try css selectors :

soup.select('a:has(>h3)')

Example

import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get('https://google.com/search?q=test',headers = {'User-Agent': 'Mozilla/5.0'}, cookies={'CONSENT':'YES+'}).text)
soup.select('a:has(>h3)')

You should try to find some standard class, id, or some other attribute in the result page to filter results by it, and then, when you make sure the results are what you wanted, you can get the top five results.

Finding a standard attribute needs a little bit of search on the result page. It seems that the class that appeared in the below screenshot will do it but you need to make sure at least there is no use of this HTML class name before the search results on the page.

Also, I think there must be some kind of limitation on the google search page, and google strongly advise to not crawl its normal search but to use the provided APIs. I think it's good to consider this option too.

谷歌搜索html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM