在 Python 中使用 BeautifulSoup 从谷歌搜索中检索链接

Question

我正在使用 Tweepy 和 BeautifulSoup4 构建一个 Twitter 机器人。 我想将请求的结果保存在列表中，但我的脚本不再工作了（但它在几天前工作）。 我一直在看它，我不明白。 这是我的 function：

import requests
import tweepy
from bs4 import BeautifulSoup
import urllib
import os
from tweepy import StreamListener
from TwitterEngine import TwitterEngine
from ConfigEngine import TwitterAPIConfig
import urllib.request
import emoji
import random

# desktop user-agent
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
# mobile user-agent
MOBILE_USER_AGENT = "Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36"




# Récupération des liens
def parseLinks(url):
    headers = {"user-agent": USER_AGENT}
    resp = requests.get(url, headers=headers)
    if resp.status_code == 200:
        soup = BeautifulSoup(resp.content, "html.parser")
        results = []
        for g in soup.find_all('div', class_='r'):
            anchors = g.find_all('a')
            if anchors:
                link = anchors[0]['href']
                results.append(link)
        return results

“url”参数在代码的 rest 中是 100% 正确的。 作为 output，我得到一个“无”。 更准确地说，执行在“results = []”行之后停止（因此它不会进入for）。

任何的想法？ 非常感谢您！

Answer 1

Google 似乎更改了页面上的 HTML 标记。 尝试将搜索从class="r"更改为class="rc" ：

import requests
from bs4 import BeautifulSoup


USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"

def parseLinks(url):
    headers = {"user-agent": USER_AGENT}
    resp = requests.get(url, headers=headers)
    if resp.status_code == 200:
        soup = BeautifulSoup(resp.content, "html.parser")
        results = []
        for g in soup.find_all('div', class_='rc'): # <-- change 'r' to 'rc'
            anchors = g.find_all('a')
            if anchors:
                link = anchors[0]['href']
                results.append(link)
        return results

url = 'https://www.google.com/search?q=tree'
print(parseLinks(url))

印刷：

['https://en.wikipedia.org/wiki/Tree', 'https://simple.wikipedia.org/wiki/Tree', 'https://www.britannica.com/plant/tree', 'https://www.treepeople.org/tree-benefits', 'https://books.google.sk/books?id=yNGrqIaaYvgC&pg=PA20&lpg=PA20&dq=tree&source=bl&ots=_TP8PqSDlT&sig=ACfU3U16j9xRJgr31RraX0HlQZ0ryv9rcA&hl=sk&sa=X&ved=2ahUKEwjOq8fXyKjsAhXhAWMBHToMDw4Q6AEwG3oECAcQAg', 'https://teamtrees.org/', 'https://www.woodlandtrust.org.uk/trees-woods-and-wildlife/british-trees/a-z-of-british-trees/', 'https://artsandculture.google.com/entity/tree/m07j7r?categoryId=other']

在 Python 中使用 BeautifulSoup 从谷歌搜索中检索链接

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-10-09 22:35:11

在 Python 中使用 BeautifulSoup 从谷歌搜索中检索链接

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-10-09 22:35:11

解决方案1
1 已采纳 2020-10-09 22:35:11