在 Python 中使用 BeautifulSoup 从谷歌搜索中检索链接

Question

I'm building a Twitter bot using Tweepy and BeautifulSoup4.我正在使用 Tweepy 和 BeautifulSoup4 构建一个 Twitter 机器人。 I'd like to save in a list the results of a request but my script isn't working anymore (but it was working days ago).我想将请求的结果保存在列表中，但我的脚本不再工作了（但它在几天前工作）。 I've been looking at it and I don't understand.我一直在看它，我不明白。 Here is my function:这是我的 function：

import requests
import tweepy
from bs4 import BeautifulSoup
import urllib
import os
from tweepy import StreamListener
from TwitterEngine import TwitterEngine
from ConfigEngine import TwitterAPIConfig
import urllib.request
import emoji
import random

# desktop user-agent
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
# mobile user-agent
MOBILE_USER_AGENT = "Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36"




# Récupération des liens
def parseLinks(url):
    headers = {"user-agent": USER_AGENT}
    resp = requests.get(url, headers=headers)
    if resp.status_code == 200:
        soup = BeautifulSoup(resp.content, "html.parser")
        results = []
        for g in soup.find_all('div', class_='r'):
            anchors = g.find_all('a')
            if anchors:
                link = anchors[0]['href']
                results.append(link)
        return results

The "url" parameter is 100% correct in the rest of the code. “url”参数在代码的 rest 中是 100% 正确的。 As an output, I get a "None".作为 output，我得到一个“无”。 To be more precise, the execution stops right after line "results = []" (so it doesn't enter into the for).更准确地说，执行在“results = []”行之后停止（因此它不会进入for）。

Any idea?任何的想法？ Thank you so much in advance!非常感谢您！

Answer 1

It seems that Google changed the HTML markup on the page. Google 似乎更改了页面上的 HTML 标记。 Try to change the search from class="r" to class="rc" :尝试将搜索从class="r"更改为class="rc" ：

import requests
from bs4 import BeautifulSoup


USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"

def parseLinks(url):
    headers = {"user-agent": USER_AGENT}
    resp = requests.get(url, headers=headers)
    if resp.status_code == 200:
        soup = BeautifulSoup(resp.content, "html.parser")
        results = []
        for g in soup.find_all('div', class_='rc'): # <-- change 'r' to 'rc'
            anchors = g.find_all('a')
            if anchors:
                link = anchors[0]['href']
                results.append(link)
        return results

url = 'https://www.google.com/search?q=tree'
print(parseLinks(url))

Prints:印刷：

['https://en.wikipedia.org/wiki/Tree', 'https://simple.wikipedia.org/wiki/Tree', 'https://www.britannica.com/plant/tree', 'https://www.treepeople.org/tree-benefits', 'https://books.google.sk/books?id=yNGrqIaaYvgC&pg=PA20&lpg=PA20&dq=tree&source=bl&ots=_TP8PqSDlT&sig=ACfU3U16j9xRJgr31RraX0HlQZ0ryv9rcA&hl=sk&sa=X&ved=2ahUKEwjOq8fXyKjsAhXhAWMBHToMDw4Q6AEwG3oECAcQAg', 'https://teamtrees.org/', 'https://www.woodlandtrust.org.uk/trees-woods-and-wildlife/british-trees/a-z-of-british-trees/', 'https://artsandculture.google.com/entity/tree/m07j7r?categoryId=other']

在 Python 中使用 BeautifulSoup 从谷歌搜索中检索链接

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-10-09 22:35:11

在 Python 中使用 BeautifulSoup 从谷歌搜索中检索链接

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-10-09 22:35:11

解决方案1
1 已采纳 2020-10-09 22:35:11