简体   繁体   English

使用Python使用Google转换的网页数据中缺少信息

[英]Missing information in scraped web data, Google translate, Using Python

I want to scrape the Google translate website and get the translated text from it using Python 3. 我想抓取Google翻译网站,并使用Python 3从中获取翻译后的文本。

Here is my code: 这是我的代码:

from bs4 import BeautifulSoup as soup
from urllib.request import Request as uReq
from urllib.request import urlopen as open


my_url = "https://translate.google.com/#en/es/I%20am%20Animikh%20Aich"

req = uReq(my_url, headers={'User-Agent':'Mozilla/5.0'})
uClient = open(req)
page_html = uClient.read()
uClient.close()
html = soup(page_html, 'html5lib')
print(html)

Unfortunately, I am unable to find the required information in the parsed Webpage. 不幸的是,我无法在解析的网页中找到所需的信息。 In chrome "Inspect", It is showing that the translated text is inside: 在chrome“ Inspect”中,显示翻译后的文本在其中:

 <span id="result_box" class="short_text" lang="es"><span class="">Yo soy Animikh Aich</span></span>

However, When I am searching for the information in the parsed HTML code, this is what I'm finding in it: 但是,当我在解析的HTML代码中搜索信息时,这就是我在其中找到的内容:

<span class="short_text" id="result_box"></span>

I have tried parsing using all of html5lib, lxml, html.parser. 我尝试使用所有html5lib,lxml,html.parser进行解析。 I have not been able to find a solution for this. 我无法为此找到解决方案。 Please help me with the issue. 请帮我解决这个问题。

you could use a specific python api: 您可以使用特定的python API:

import goslate
gs = goslate.Goslate()
print(gs.translate('I am Animikh Aich', 'es'))
Yo soy Animikh Aich

JavaScript is modifying the HTML code after it loads. JavaScript加载后正在修改HTML代码。 urllib can't handle JavaScript, you'll have to use Selenium to get the data that you want. urllib无法处理JavaScript,您必须使用Selenium来获取所需的数据。

For installation and demo, refer this link . 有关安装和演示, 请参考此链接

Try like below to get the desired content: 尝试如下所示以获得所需的内容:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://translate.google.com/#en/es/I%20am%20Animikh%20Aich")
soup = BeautifulSoup(driver.page_source, 'html5lib')
item = soup.select_one("#result_box span").text
print(item)
driver.quit()

Output: 输出:

Yo soy Animikh Aich

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM