简体   繁体   English

从网页中提取表格

[英]Extracting table from webpage

I am working on an automation project, and I need the historic exchange rates specifically from this page -- http://bnro.ro/files/xml/nbrfxrates2017.htm , and others just like it for different years. 我正在从事一个自动化项目,我需要本页特别提供的历史汇率-http://bnro.ro/files/xml/nbrfxrates2017.htm以及其他类似的年份。

The problem is BS doesn't seem to work since the table is loaded from an XML file. 问题在于BS似乎无法正常工作,因为该表是从XML文件加载的。 Selenium is not an option, since we need the program to run in background (unless it is possible for selenium) nor is the Forex module, since the rates are slightly different. 硒不是一种选择,因为我们需要程序在后台运行(除非硒是可能的),因为汇率略有不同,所以外汇模块也不需要。

Is it possible to get data from this table or the XML file? 是否可以从该表或XML文件中获取数据? Or do I have to ask them for their archives? 还是我必须要他们提供档案?

As you said, the data is loaded from an XML file. 如您所说,数据是从XML文件加载的。 If you check the Network tab in the Developer Tools, you can see that XML file is obtained by sending a request to this site - http://bnro.ro/files/xml/years/nbrfxrates2017.xml 如果在开发人员工具中选中“ Network选项卡,则可以看到通过向该站点发送请求http://bnro.ro/files/xml/years/nbrfxrates2017.xml来获取XML文件。

You can use this url to get the data using requests module. 您可以使用此网址通过requests模块获取数据。

import requests

r = requests.get('http://bnro.ro/files/xml/years/nbrfxrates2017.xml')
print('2017-01-03' in r.text)  # To check whether successful.
# True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM