简体   繁体   English

使用 Python 从网站抓取数据

[英]Scraping data from a website using Python

Is it possible to extract the data from the graphs of this website using a Python code?是否可以使用 Python 代码从本网站的图表中提取数据? https://xsi.xeneta.com/ https://xsi.xeneta.com/

Yes, assuming that the data exists on the page you could use requests to get the page, then extract the data you want.是的,假设页面上存在数据,您可以使用请求获取页面,然后提取所需的数据。 It would look something like它看起来像

import requests
page = requests.get(url="https://xsi.xeneta.com/")
data = page.content
print(data)

This would give you a starting point at least to do whatever processing you want.这将为您提供一个起点,至少可以进行您想要的任何处理。

For some functions that might be helpful here- https://www.w3schools.com/python/ref_requests_response.asp对于此处可能有用的一些功能-https://www.w3schools.com/python/ref_requests_response.asp

If you inspect the graph you'll see it's nested inside iframe.如果您检查图表,您会看到它嵌套在 iframe 内。 I grabbed the 1st graph and navigate directly to that site, and not on xsi.xeneta.com.我抓住了第一个图表并直接导航到该站点,而不是 xsi.xeneta.com。 You can also see that there's a lot of data in data-json attribute, so this code prints that data using selenium.您还可以看到 data-json 属性中有很多数据,因此此代码使用 selenium 打印该数据。

Imports:进口:

pip install selenium
pip install webdriver-manager

Code:代码:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.implicitly_wait(5)
driver.get("https://xsi-short.xeneta.com/xsic/chart/asia-europe/")
canvas = driver.find_element_by_xpath('//*[@id="chart-visualization-b9948b5ccd27f73bf764abe4a935c502"]')
print(canvas.get_attribute("data-json"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM