简体   繁体   中英

Unable to retrieve links off google search results page using BeautifulSoup

I'm trying to grab all relevant links that show up on the results page for any given query using bs4, and then open them up on a new window.

The problem is, I'm not getting the relevant links. For any given query, my script returns links to things like gmail, google images, etc -- not links relevant to the query.

#!/usr/bin/python3
import webbrowser as wb
import requests 
import bs4 as bs



search=input()
url="https://www.google.ae/?gfe_rd=cr&ei=mgSoWKmWO-aG7gTgmJ2QDA&gws_rd=ssl#q="+search
#print(url)
user_agent = {'User-Agent': 'Mozilla/5.0'}

#headers['User-Agent'] = 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17'

req=requests.get(url,headers=user_agent)
soup=bs.BeautifulSoup(req.text,"lxml")
print(req.status_code)
count=0
for link in soup.find_all("a"):
    print(link.get("href"))
    if search in link.text:
        wb.open(link.get("href"))

I tried changing my user-agent to a really old one in the hopes that google might revert to html, but no such luck with that.

I know it it's possible to retrieve links with the google search API, but I'm curious to know if there's any way I can get the job done with bs4 instead.

You can use the google package which gives intuitive access to the search results of google.

from google import search
for result in search('example'):
    print(result)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM