[英]How can I get the contents of the “feedback” box from Google searches?
When you ask a question or request the definition of a word in a Google search, Google gives you a summary of the answer in the "feedback" box.当您在 Google 搜索中提出问题或要求提供某个词的定义时,Google 会在“反馈”框中为您提供答案摘要。
For example, when you search for define apple
you get this result:例如,当您搜索
define apple
您会得到以下结果:
Now, I would like to make it clear that I do not need the entire page or the other results, I just need this box:现在,我想明确表示我不需要整个页面或其他结果,我只需要这个框:
How can I use the Requests
and Beautiful Soup
modules to get the contents of this "feedback" box in Python 3?如何使用
Requests
和Beautiful Soup
模块在 Python 3 中获取此“反馈”框的内容?
If that is not possible can I use the Google Search Api to get the contents of the "feedback" box?如果这是不可能的,我可以使用 Google Search Api 来获取“反馈”框的内容吗?
I have found a similar question on SO but the OP has not specified the language, there are no answers and I fear that the two comments are outdated as this question was asked nearly 9 months ago.我在 SO 上发现了一个类似的问题,但 OP 没有指定语言,没有答案,我担心这两个评论已经过时了,因为这个问题是在近 9 个月前提出的。
Thank you for your time & help in advance.提前感谢您的时间和帮助。
It is easily done using requests and bs4 , you just need to pull the text from the div with the class lr_dct_ent使用requests和bs4很容易完成,您只需要使用类lr_dct_ent从div 中提取文本
import requests
from bs4 import BeautifulSoup
h = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}
r = requests.get("https://www.google.ie/search?q=define+apple", headers=h).text
soup = BeautifulSoup(r)
print("\n".join(soup.select_one("div.lr_dct_ent").text.split(";")))
The main text is in an ordered list, the noun is in the div with the lr_dct_sf_h class:正文在一个有序列表中,名词在带有lr_dct_sf_h类的 div 中:
In [11]: r = requests.get("https://www.google.ie/search?q=define+apple", headers=h).text
In [12]: soup = BeautifulSoup(r,"lxml")
In [13]: div = soup.select_one("div.lr_dct_ent")
In [14]: n_v = div.select_one("div.lr_dct_sf_h").text
In [15]: expl = [li.text for li in div.select("ol.lr_dct_sf_sens li")]
In [16]: print(n_v)
noun
In [17]: print("\n".join(expl))
1. the round fruit of a tree of the rose family, which typically has thin green or red skin and crisp flesh.used in names of unrelated fruits or other plant growths that resemble apples in some way, e.g. custard apple, oak apple.
used in names of unrelated fruits or other plant growths that resemble apples in some way, e.g. custard apple, oak apple.
2. the tree bearing apples, with hard pale timber that is used in carpentry and to smoke food.
Question is nice idea问题是个好主意
program can be started with python3 defineterm.py apple程序可以用python3defineterm.py apple启动
#! /usr/bin/env python3.5
# defineterm.py
import requests
from bs4 import BeautifulSoup
import sys
import html
import codecs
searchterm = ' '.join(sys.argv[1:])
url = 'https://www.google.com/search?q=define+' + searchterm
res = requests.get(url)
try:
res.raise_for_status()
except Exception as exc:
print('error while loading page occured: ' + str(exc))
text = html.unescape(res.text)
soup = BeautifulSoup(text, 'lxml')
prettytext = soup.prettify()
#next lines are for analysis (saving raw page), you can comment them
frawpage = codecs.open('rawpage.txt', 'w', 'utf-8')
frawpage.write(prettytext)
frawpage.close()
firsttag = soup.find('h3', class_="r")
if firsttag != None:
print(firsttag.getText())
print()
#second tag may be changed, so check it if not returns correct result. That might be situation for all searched tags.
secondtag = soup.find('div', {'style': 'color:#666;padding:5px 0'})
if secondtag != None:
print(secondtag.getText())
print()
termtags = soup.findAll("li", {"style" : "list-style-type:decimal"})
count = 0
for tag in termtags:
count += 1
print( str(count)+'. ' + tag.getText())
print()
make script as executable将脚本设为可执行
then in ~/.bashrc然后在 ~/.bashrc
this line can be added可以添加这一行
alias defterm="/data/Scrape/google/defineterm.py "
putting correct path to script your place放置正确的路径来编写您的位置
then executing然后执行
source ~/.bashrc
program can be started with:程序可以通过以下方式启动:
defterm apple (or other term)
The easiest way is to grab CSS selectors of this text by using the SelectorGadget .最简单的方法是使用SelectorGadget获取此文本的 CSS 选择器。
from bs4 import BeautifulSoup
import requests, lxml
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
html = requests.get('https://www.google.de/search?q=define apple', headers=headers)
soup = BeautifulSoup(html.text, 'lxml')
syllables = soup.select_one('.frCXef span').text
phonetic = soup.select_one('.g30o5d span span').text
noun = soup.select_one('.h3TRxf span').text
print(f'{syllables}\n{phonetic}\n{noun}')
# Output:
'''
ap·ple
ˈapəl
the round fruit of a tree of the rose family, which typically has thin red or green skin and crisp flesh. Many varieties have been developed as dessert or cooking fruit or for making cider.
'''
Alternatively, you can do the same thing using Google Direct Answer Box API from SerpApi.或者,您可以使用来自 SerpApi 的Google Direct Answer Box API执行相同的操作。 It's a paid API with a free trial of 5,000 searches.
这是一个付费 API,可免费试用 5,000 次搜索。
Code to integrate:集成代码:
from serpapi import GoogleSearch
params = {
"api_key": "YOUR_API_KEY",
"engine": "google",
"q": "define apple",
"google_domain": "google.com",
}
search = GoogleSearch(params)
results = search.get_dict()
syllables = results['answer_box']['syllables']
phonetic = results['answer_box']['phonetic']
noun = results['answer_box']['definitions'][0] # array output
print(f'{syllables}\n{phonetic}\n{noun}')
# Output:
'''
ap·ple
ˈapəl
the round fruit of a tree of the rose family, which typically has thin red or green skin and crisp flesh. Many varieties have been developed as dessert or cooking fruit or for making cider.
'''
Disclaimer, I work for SerpApi
免责声明,我为 SerpApi 工作
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.