使用Python使用Google轉換的網頁數據中缺少信息

Question

我想抓取Google翻譯網站，並使用Python 3從中獲取翻譯后的文本。

這是我的代碼：

from bs4 import BeautifulSoup as soup
from urllib.request import Request as uReq
from urllib.request import urlopen as open


my_url = "https://translate.google.com/#en/es/I%20am%20Animikh%20Aich"

req = uReq(my_url, headers={'User-Agent':'Mozilla/5.0'})
uClient = open(req)
page_html = uClient.read()
uClient.close()
html = soup(page_html, 'html5lib')
print(html)

不幸的是，我無法在解析的網頁中找到所需的信息。 在chrome“ Inspect”中，顯示翻譯后的文本在其中：

 <span id="result_box" class="short_text" lang="es"><span class="">Yo soy Animikh Aich</span></span>

但是，當我在解析的HTML代碼中搜索信息時，這就是我在其中找到的內容：

<span class="short_text" id="result_box"></span>

我嘗試使用所有html5lib，lxml，html.parser進行解析。 我無法為此找到解決方案。 請幫我解決這個問題。

Answer 1

您可以使用特定的python API：

import goslate
gs = goslate.Goslate()
print(gs.translate('I am Animikh Aich', 'es'))
Yo soy Animikh Aich

Answer 2

JavaScript加載后正在修改HTML代碼。 urllib無法處理JavaScript，您必須使用Selenium來獲取所需的數據。

有關安裝和演示，請參考此鏈接。

Answer 3

嘗試如下所示以獲得所需的內容：

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Chrome()
driver.get("https://translate.google.com/#en/es/I%20am%20Animikh%20Aich")
soup = BeautifulSoup(driver.page_source, 'html5lib')
item = soup.select_one("#result_box span").text
print(item)
driver.quit()

輸出：

Yo soy Animikh Aich

使用Python使用Google轉換的網頁數據中缺少信息

問題描述

3 個解決方案

解決方案1
2 2018-01-12 13:06:37

解決方案2
1 2018-01-12 14:05:42

解決方案3
1 已采納 2018-01-12 19:08:32

使用Python使用Google轉換的網頁數據中缺少信息

問題描述

3 個解決方案

解決方案1 2 2018-01-12 13:06:37

解決方案2 1 2018-01-12 14:05:42

解決方案3 1 已采納 2018-01-12 19:08:32

解決方案1
2 2018-01-12 13:06:37

解決方案2
1 2018-01-12 14:05:42

解決方案3
1 已采納 2018-01-12 19:08:32