简体   繁体   English

如何从具有多个标准选项的动态图表中通过 XPath 进行 web 抓取?

[英]How can I do web scraping through XPath from dynamic charts with multiple criteria options?

I am very new to scraping and programing in general.一般来说,我对抓取和编程非常陌生。 That's why I am asking for help with the next issue.这就是为什么我在下一个问题上寻求帮助。 There is a web site under the url. url 下有一个 web 站点。 I need to get data from dynamic charts.我需要从动态图表中获取数据。 The code has to be written with an option of looping through all the required days data represented for and an option of looping though all elements containing the data.编写代码时必须选择循环遍历所有表示的所需日期数据,以及循环遍历包含数据的所有元素的选项。

First issue is that I need somehow to get the data following the XPath.第一个问题是我需要以某种方式获取 XPath 之后的数据。 And the second one is that I have to write the loop to get all the required inflammation第二个是我必须编写循环来获得所有需要的炎症

url = "https://www.oree.com.ua/index.php/control/results_mo/DAM"


from selenium import webdriver
import requests
import pandas as pd
import time



browser = webdriver.PhantomJS(executable_path = "C:/ProgramData/Anaconda3/Lib/site-packages/phantomjs-2.1.1-windows/bin/phantomjs")
browser.get(url)
time.sleep(2)


elements = browser.find_elements_by_xpath("html/body/div[5]/div[1]/div[3]/div[3]/div/div/table/tbody/tr[1]/td[3]/text()")
for element in elements:
    print(element)

browser.quit()

Not sure Selenium is required here.不确定此处是否需要 Selenium。 You can get directly data from this mixed html/json object (change the date accordingly to your needs):您可以直接从此混合的 html/json object 获取数据(根据您的需要更改日期):

https://www.oree.com.ua/index.php/PXS/get_pxs_hdata/04.04.2020/DAM/1

Then request with:然后请求:

//tbody//tr/td[i]

Where i is the column of interest.其中 i 是感兴趣的列。 Range of i is 3-7. i的范围是3-7。 Column 3 is "Sales volume, MW.h", 4 is "Purchase volume, MW.h", etc...第 3 列是“销售量,MW.h”,第 4 列是“购买量,MW.h”,等等...

Output for sales volume (04/04/2020): Output 销量 (04/04/2020): 音量

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM