[英]How can I scrape nodes text from a javascript Piechart graph using Python
如何使用這樣的 Python 從 javascript Piechart 圖中抓取節點?
https://www.dice.com/skills/javascript
提示:我希望從圖中抓取的文本代表圖節點,而不是普通文本。
實際上頁面是通過JavaScript
呈現的,因此我們可以使用selenium
或使用requests
和bs4
因為所需的輸出位於script
標記中,可以使用regex
捕獲
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get(
'https://www.dice.com/skills/javascript')
soup = BeautifulSoup(driver.page_source, 'html.parser')
for item in soup.findAll("div", {'class': 'node'}):
print(item.text)
driver.quit()
輸出:
JavaScript
CSS
HTML5
jQuery
jQuery UI
AngularJS
Bootstrap
jQuery
jQuery UI
CSS
HTML5
AngularJS
Ajax
jQuery UI
jQuery
Aptana
Zend Studio
Ajax
CSS
AngularJS
jQuery
HTML5
HTML
Bootstrap
CSS
Node.js
React.js
MongoDB
AngularJS
Express.js
NoSQL
更新:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.dice.com/skills/javascript")
soup = BeautifulSoup(r.text, 'html.parser')
script = soup.findAll("script")[8].get_text("\t", strip=True)
start = script.find("{")
end = script.find(";")
print(script[start:end])
輸出:
{"name":"JavaScript","children":[{"name":"CSS","children":[{"name":"HTML5"},{"name":"jQuery"},{"name":"jQuery UI"},{"name":"AngularJS"},{"name":"Bootstrap"}]},{"name":"jQuery","children":[{"name":"jQuery UI"},{"name":"CSS"},{"name":"HTML5"},{"name":"AngularJS"},{"name":"Ajax"}]},{"name":"jQuery UI","children":[{"name":"jQuery"},{"name":"Aptana"},{"name":"Zend Studio"},{"name":"Ajax"},{"name":"CSS"}]},{"name":"AngularJS","children":[{"name":"jQuery"},{"name":"HTML5"},{"name":"HTML"},{"name":"Bootstrap"},{"name":"CSS"}]},{"name":"Node.js","children":[{"name":"React.js"},{"name":"MongoDB"},{"name":"AngularJS"},{"name":"Express.js"},{"name":"NoSQL"}]}]}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.