[英]I can't get all the html data from beautiful soup
我是網絡抓取的新手,我只想從谷歌頁面獲取文本(基本上是足球比賽的日期),但是湯沒有得到所有的 html(我因為請求而在猜測)所以我找不到它,我知道這可能是因為谷歌使用 javascript 而我應該使用 selenium chromedriver,但問題是我需要代碼可以在另一台計算機上使用,所以它不能真正使用它..
繼承人的代碼:
import pandas as pd
from bs4 import BeautifulSoup
import requests
a = "Newcastle"
url ="https://www.google.com/search?q=" + a + "+next+match"
response = requests.get(url)
soup = BeautifulSoup(response.text,"html.parser")
print(soup)
for a in soup.findAll('div') :
print(soup.get_text())
我想找到的是
"<span class="imso_mh__lr-dt-ds">17/12, 13:30</span>"
它有
"//*[@id="sports-app"]/div/div[3]/div[1]/div/div/div/div/div[1]/div/div[1]/div/span[2]"
作為 xpath
有可能嗎?
從 Google 請求頁面時嘗試設置User-Agent
header:
import requests
from bs4 import BeautifulSoup
a = "Newcastle"
url = "https://www.google.com/search?q=" + a + "+next+match&hl=en"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:106.0) Gecko/20100101 Firefox/106.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
next_match = soup.select_one('[data-entityname="Match Header"]')
for t in next_match.select('[aria-hidden="true"]'):
t.extract()
text = next_match.get_text(strip=True, separator=" ")
print(text)
印刷:
Club Friendlies · Dec 17, 13:30 Newcastle VS Vallecano
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.