简体   繁体   中英

soup.select('.r a') returns an empty list?

I am trying to build a program to select at most 5 search result, search them in google and open it up in the browser. However, "soup.select('.r a')" in the program returns an empty list.

import requests
import sys
import webbrowser
import bs4
res=requests.get('http://google.com/search?q='+'Python'.join(sys.argv[1:]))
res.raise_for_status()
soup=bs4.BeautifulSoup(res.text,'html.parser')
linkElements=soup.select('.r a')
linkToOpen=min(5,len(linkElements))
for i in range(linkToOpen):
    webbrowser.open('https//google.com'+linkElements[i].get('href'))

The code runs without any error and without any output but does not open the browser with search results as it was supposed to do.

The page uses heavy Javascript, so what you get through request isn't exactly what you see in your browser. You could use this script to scrape search links from the page (it will search for all links where href begins with /url?q= ):

import requests
import bs4

res=requests.get('http://google.com/search?q=Python')
res.raise_for_status()
soup=bs4.BeautifulSoup(res.text,'lxml')

for a in soup.select('a[href^="/url?q="]'):
    if 'accounts.google.com' in a['href']:
        continue
    print(a['href'])

Prints:

/url?q=https://www.python.org/&sa=U&ved=2ahUKEwiCivuzmMDjAhVtxMQBHWfIBxYQFjAAegQIBxAB&usg=AOvVaw3TQfZO4gqXrTLm27x1qkJF
/url?q=https://www.python.org/downloads/&sa=U&ved=2ahUKEwiCivuzmMDjAhVtxMQBHWfIBxYQjBAwAXoECAcQAw&usg=AOvVaw1ktQJcwOoHkm6N4OpYlgA-
/url?q=https://www.python.org/downloads/release/python-373/&sa=U&ved=2ahUKEwiCivuzmMDjAhVtxMQBHWfIBxYQjBAwAnoECAcQBQ&usg=AOvVaw1DkCjMJbFGfNpiQw1qDBWB
/url?q=https://www.python.org/about/gettingstarted/&sa=U&ved=2ahUKEwiCivuzmMDjAhVtxMQBHWfIBxYQjBAwA3oECAcQBw&usg=AOvVaw1ih35T-Enlb7d32gyyNGvc
...and so on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM