简体   繁体   English

了解如何从HTML文件提取数据

[英]Understanding how to extract data from HTML File

I am trying to access the "Yield Curve Data" available on this page . 我正在尝试访问此页面上的“屈服曲线数据”。 It has a radio button which upon clicking "Submit" results in a zip File, from which I am looking to get the data. 它具有一个单选按钮,单击“提交”后将生成一个zip文件,我正在从中获取数据。 I am looking to get the data from the "Retrieve all data" Option. 我希望从“检索所有数据”选项中获取数据。 My code is as follows, and from the statement print result.read() I realize that result is actually a HTML Document. 我的代码如下,从语句print result.read()我意识到result实际上是HTML文档。 My difficult is in understanding how to extract the data from result as I don't see any data in this. 我的困难在于理解如何从result提取数据,因为我在其中看不到任何数据。 I am confused as to where to go from here. 我对从这里去哪里感到困惑。

import urllib, urllib2
import csv
from StringIO import StringIO
import pandas as pd
import os
from zipfile import ZipFile

my_url = 'http://www.bankofcanada.ca/rates/interest-rates/bond-yield-curves/'
data = urllib.urlencode({'lastchange': 'all'}) 
request = urllib2.Request(my_url, data)
result = urllib2.urlopen(request)

Thank You 谢谢

Your going to need to generate a post request to the following endpoint: 您需要向以下端点生成发布请求:

http://www.bankofcanada.ca/stats/results/csv

With the following form data: 使用以下表单数据:

lookupPage: lookup_yield_curve.php
startRange: 1986-01-01
searchRange: all

This should give you the file. 这应该给您文件。

You may also need to fake your useragent. 您可能还需要伪造您的useragent。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM