简体   繁体   English

如何从 Google 搜索中获取“反馈”框中的内容?

[英]How can I get the contents of the “feedback” box from Google searches?

When you ask a question or request the definition of a word in a Google search, Google gives you a summary of the answer in the "feedback" box.当您在 Google 搜索中提出问题或要求提供某个词的定义时,Google 会在“反馈”框中为您提供答案摘要。

For example, when you search for define apple you get this result:例如,当您搜索define apple您会得到以下结果:

反馈示例

Now, I would like to make it clear that I do not need the entire page or the other results, I just need this box:现在,我想明确表示不需要整个页面或其他结果,我只需要这个框:

突出显示的反馈示例

How can I use the Requests and Beautiful Soup modules to get the contents of this "feedback" box in Python 3?如何使用RequestsBeautiful Soup模块在 Python 3 中获取此“反馈”框的内容?

If that is not possible can I use the Google Search Api to get the contents of the "feedback" box?如果这是不可能的,我可以使用 Google Search Api 来获取“反馈”框的内容吗?

I have found a similar question on SO but the OP has not specified the language, there are no answers and I fear that the two comments are outdated as this question was asked nearly 9 months ago.我在 SO 上发现了一个类似的问题,但 OP 没有指定语言,没有答案,我担心这两个评论已经过时了,因为这个问题是在近 9 个月前提出的。

Thank you for your time & help in advance.提前感谢您的时间和帮助。

It is easily done using requests and bs4 , you just need to pull the text from the div with the class lr_dct_ent使用requestsbs4很容易完成,您只需要使用类lr_dct_entdiv 中提取文本

import requests
from bs4 import BeautifulSoup

h = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}
r = requests.get("https://www.google.ie/search?q=define+apple", headers=h).text
soup = BeautifulSoup(r)

print("\n".join(soup.select_one("div.lr_dct_ent").text.split(";")))

The main text is in an ordered list, the noun is in the div with the lr_dct_sf_h class:正文在一个有序列表中,名词在带有lr_dct_sf_h类的 div 中:

In [11]: r = requests.get("https://www.google.ie/search?q=define+apple", headers=h).text
In [12]: soup = BeautifulSoup(r,"lxml")    
In [13]: div = soup.select_one("div.lr_dct_ent")    
In [14]: n_v = div.select_one("div.lr_dct_sf_h").text   
In [15]: expl = [li.text for li in div.select("ol.lr_dct_sf_sens li")]    
In [16]: print(n_v)
noun

In [17]: print("\n".join(expl))
1. the round fruit of a tree of the rose family, which typically has thin green or red skin and crisp flesh.used in names of unrelated fruits or other plant growths that resemble apples in some way, e.g. custard apple, oak apple.
used in names of unrelated fruits or other plant growths that resemble apples in some way, e.g. custard apple, oak apple.
2. the tree bearing apples, with hard pale timber that is used in carpentry and to smoke food.

Question is nice idea问题是个好主意

program can be started with python3 defineterm.py apple程序可以用python3defineterm.py apple启动

#! /usr/bin/env python3.5
# defineterm.py

import requests
from bs4 import BeautifulSoup
import sys
import html
import codecs

searchterm = ' '.join(sys.argv[1:])

url = 'https://www.google.com/search?q=define+' + searchterm
res = requests.get(url)
try:
    res.raise_for_status()
except Exception as exc:
    print('error while loading page occured: ' + str(exc))

text = html.unescape(res.text)
soup = BeautifulSoup(text, 'lxml')
prettytext = soup.prettify()

#next lines are for analysis (saving raw page), you can comment them
frawpage = codecs.open('rawpage.txt', 'w', 'utf-8')
frawpage.write(prettytext)
frawpage.close()

firsttag = soup.find('h3', class_="r")
if firsttag != None:
    print(firsttag.getText())
    print()

#second tag may be changed, so check it if not returns correct result. That might be situation for all searched tags.
secondtag = soup.find('div', {'style': 'color:#666;padding:5px 0'})
if secondtag != None:
    print(secondtag.getText())
    print()

termtags = soup.findAll("li", {"style" : "list-style-type:decimal"})

count = 0
for tag in termtags:
    count += 1
    print( str(count)+'. ' + tag.getText())
    print()

make script as executable将脚本设为可执行

then in ~/.bashrc然后在 ~/.bashrc
this line can be added可以添加这一行

alias defterm="/data/Scrape/google/defineterm.py "

putting correct path to script your place放置正确的路径来编写您的位置

then executing然后执行

source ~/.bashrc

program can be started with:程序可以通过以下方式启动:

defterm apple (or other term)

The easiest way is to grab CSS selectors of this text by using the SelectorGadget .最简单的方法是使用SelectorGadget获取此文本的 CSS 选择

from bs4 import BeautifulSoup
import requests, lxml

headers = {
    'User-agent':
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

html = requests.get('https://www.google.de/search?q=define apple', headers=headers)
soup = BeautifulSoup(html.text, 'lxml')

syllables = soup.select_one('.frCXef span').text
phonetic = soup.select_one('.g30o5d span span').text
noun = soup.select_one('.h3TRxf span').text
print(f'{syllables}\n{phonetic}\n{noun}')

# Output:
'''
ap·ple
ˈapəl
the round fruit of a tree of the rose family, which typically has thin red or green skin and crisp flesh. Many varieties have been developed as dessert or cooking fruit or for making cider.
'''

Alternatively, you can do the same thing using Google Direct Answer Box API from SerpApi.或者,您可以使用来自 SerpApi 的Google Direct Answer Box API执行相同的操作。 It's a paid API with a free trial of 5,000 searches.这是一个付费 API,可免费试用 5,000 次搜索。

Code to integrate:集成代码:

from serpapi import GoogleSearch

params = {
  "api_key": "YOUR_API_KEY",
  "engine": "google",
  "q": "define apple",
  "google_domain": "google.com",
}

search = GoogleSearch(params)
results = search.get_dict()

syllables = results['answer_box']['syllables']
phonetic = results['answer_box']['phonetic']
noun = results['answer_box']['definitions'][0] # array output
print(f'{syllables}\n{phonetic}\n{noun}')

# Output:
'''
ap·ple
ˈapəl
the round fruit of a tree of the rose family, which typically has thin red or green skin and crisp flesh. Many varieties have been developed as dessert or cooking fruit or for making cider.
'''

Disclaimer, I work for SerpApi免责声明,我为 SerpApi 工作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将特定行的内容打印到控制台? - How can I get the contents of a specific line printed out to console? 如何获取不和谐服务器上发送的最后一条消息的内容 - How can I get the contents of the last message sent on a discord server 如何获取具有特定类后缀的 span 元素的内容? - How can I get the contents of span elements with a specific class suffix? 如何在 python 中获得 AppleScript 对话框的响应? - How can I get the response of an AppleScript dialog box in python? 如何确保终端搜索venv文件夹而不是第三方模块的默认python位置? - How can I make sure that terminal searches the venv folder and not the default python location for third party modules? 如何获取JavaScript变量的内容? - How do i get the contents of a javascript variable? 如何将列表的内容作为字符串获取? - How do I get the contents of a list as a string? 如何删除标签中的内容? - How can i erase contents in a Label? 如何从 pip 搜索包的索引中删除 URL? - How do I remove a URL from the indexes where pip searches for packages? 如何使用 BeautifulSoup 获取页面上特定文本之后的一些内容? - How can I use BeautifulSoup to get a few contents that comes after a specific text on a page?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM