简体   繁体   English

在 Python 中使用 BeautifulSoup 从谷歌搜索中检索链接

[英]Retrieving links from a Google search using BeautifulSoup in Python

I'm building a Twitter bot using Tweepy and BeautifulSoup4.我正在使用 Tweepy 和 BeautifulSoup4 构建一个 Twitter 机器人。 I'd like to save in a list the results of a request but my script isn't working anymore (but it was working days ago).我想将请求的结果保存在列表中,但我的脚本不再工作了(但它在几天前工作)。 I've been looking at it and I don't understand.我一直在看它,我不明白。 Here is my function:这是我的 function:

import requests
import tweepy
from bs4 import BeautifulSoup
import urllib
import os
from tweepy import StreamListener
from TwitterEngine import TwitterEngine
from ConfigEngine import TwitterAPIConfig
import urllib.request
import emoji
import random

# desktop user-agent
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
# mobile user-agent
MOBILE_USER_AGENT = "Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36"




# Récupération des liens
def parseLinks(url):
    headers = {"user-agent": USER_AGENT}
    resp = requests.get(url, headers=headers)
    if resp.status_code == 200:
        soup = BeautifulSoup(resp.content, "html.parser")
        results = []
        for g in soup.find_all('div', class_='r'):
            anchors = g.find_all('a')
            if anchors:
                link = anchors[0]['href']
                results.append(link)
        return results

The "url" parameter is 100% correct in the rest of the code. “url”参数在代码的 rest 中是 100% 正确的。 As an output, I get a "None".作为 output,我得到一个“无”。 To be more precise, the execution stops right after line "results = []" (so it doesn't enter into the for).更准确地说,执行在“results = []”行之后停止(因此它不会进入for)。

Any idea?任何的想法? Thank you so much in advance!非常感谢您!

It seems that Google changed the HTML markup on the page. Google 似乎更改了页面上的 HTML 标记。 Try to change the search from class="r" to class="rc" :尝试将搜索从class="r"更改为class="rc"

import requests
from bs4 import BeautifulSoup


USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"

def parseLinks(url):
    headers = {"user-agent": USER_AGENT}
    resp = requests.get(url, headers=headers)
    if resp.status_code == 200:
        soup = BeautifulSoup(resp.content, "html.parser")
        results = []
        for g in soup.find_all('div', class_='rc'): # <-- change 'r' to 'rc'
            anchors = g.find_all('a')
            if anchors:
                link = anchors[0]['href']
                results.append(link)
        return results

url = 'https://www.google.com/search?q=tree'
print(parseLinks(url))

Prints:印刷:

['https://en.wikipedia.org/wiki/Tree', 'https://simple.wikipedia.org/wiki/Tree', 'https://www.britannica.com/plant/tree', 'https://www.treepeople.org/tree-benefits', 'https://books.google.sk/books?id=yNGrqIaaYvgC&pg=PA20&lpg=PA20&dq=tree&source=bl&ots=_TP8PqSDlT&sig=ACfU3U16j9xRJgr31RraX0HlQZ0ryv9rcA&hl=sk&sa=X&ved=2ahUKEwjOq8fXyKjsAhXhAWMBHToMDw4Q6AEwG3oECAcQAg', 'https://teamtrees.org/', 'https://www.woodlandtrust.org.uk/trees-woods-and-wildlife/british-trees/a-z-of-british-trees/', 'https://artsandculture.google.com/entity/tree/m07j7r?categoryId=other']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 无法使用beautifulsoup python从谷歌搜索中提取链接 - Unable to extract links from a google search using beautifulsoup python 无法使用机械化和Beautifulsoup从Google搜索结果中获取正确的链接 - Not Getting proper links from google search results using mechanize and Beautifulsoup 使用python中的BeautifulSoup搜索&#39;a&#39;链接中的图像 - Search images in 'a' links with BeautifulSoup in python 无法使用BeautifulSoup在Google搜索结果页面上检索链接 - Unable to retrieve links off google search results page using BeautifulSoup 在 Python 中使用 BeautifulSoup 抓取谷歌搜索 - Webscraping google search using BeautifulSoup in Python Python BeautifulSoup:从 Google Play 商店检索评论相关信息 - Python BeautifulSoup: Retrieving review related information from Google Play Store 如何使用Selenium,Python从Google搜索中提取链接 - How to Pull Links from Google Search using Selenium, Python Python:通过搜索解析来自 Google 的链接 - Python: parse links from Google with search python:带有 BeautifulSoup 的 Google 搜索抓取工具 - python: Google Search Scraper with BeautifulSoup 使用python和BeautifulSoup从网页检索特定链接 - retrieve specific links from web page using python and BeautifulSoup
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM