[英]Python: How to download Excel file from page
此代码将使您以TrickyBen登录,并向网站API发出请求...
从lxml导入请求从请求导入html从会话导入会话导入熊猫作为pd导入关闭
raceSession = Session()
LoginDetails = {'login': 'TrickyBen', 'password': 'TrickyBen123'}
LoginUrl = 'https://www.horseracebase.com/horse-racing-results.php?year=2005&month=3&day=15/horsebase1.php'
LoginPost = raceSession.post(LoginUrl, data=LoginDetails)
RaceUrl = 'https://www.horseracebase.com/excelresults.php'
RaceDataDetails = {"user": "41495", "racedate": "2005-3-15", "downloadbutton": "Excel"}
PostHeaders = {"Content-Type": "application/x-www-form-urlencoded"}
Response = raceSession.post(RaceUrl, data=RaceDataDetails, headers=PostHeaders)
Table = pd.read_table(Response.text)
Table.to_csv('blahblah.csv')
如果检查元素,您会注意到相关元素看起来像这样...
<form action="excelresults.php" method="post">
<input type="hidden" name="user" value="41495">
<input type="hidden" name="racedate" value="2005-3-15">
<input type="submit" class="downloadbutton" value="Excel">
</form>
我收到此错误消息...
Traceback (most recent call last):
File "/Users/Alex/Desktop/DateTest/hrpull.py", line 20, in <module>
Table = pd.read_table(Response.text)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 315, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 645, in __init__
self._make_engine(self.engine)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 799, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 1213, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 358, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3427)
File "pandas/parser.pyx", line 628, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:6861)
IOError: File race_date race_time track race_name race_restrictions_age race_class major race_distance prize_money going_description number_of_runners place distbt horse_name stall trainer horse_age jockey_name jockeys_claim pounds odds fav official_rating comptime TotalDstBt MedianOR Dist_Furlongs placing_numerical RCode BFSP BFSP_Place PlcsPaid BFPlcsPaid Yards RailMove RaceType
"2005-03-15" "14:00:00" "Cheltenham" "Letheby & Christopher Supreme Novices Hurdle " "4yo+" "Class 1" "Grade 1" "2m˝f " "58000" "Good" "20" "1st" "Arcalis" "0" "Johnson, J Howard" "5" "Lee, G" "0" "161" "21" "136" "3 mins 53.00s" "121.5" "16.5" "1" "National Hunt" "0" "0" "3" "0" "0" "0" "Novices Hurdle"
"2005-03-15" "14:00:00" "Cheltenham" "Letheby & Christopher Supreme Novices Hurdle " "4yo+" "Class 1" "Grade 1" "2m˝f " "58000" "Good" "20" "2nd" "6" "Wild Passion (GER)" "0" "Meade, Noel" "5" "Carberry, P" "0" "161" "11" "0" "3 mins 53.00s" "6" "121.5" "16.5" "2" "National Hunt" "0" "0" "3" "0" "0" "0" "Novices Hurdle"
我认为您可以在另一个网页上看到要下载的数据,例如,通过单击“我的系统(v4)”。 如果可以这样做,则可以使用urllib.request.urlretrieve下载该页面。 然后,您可以使用html.parser.HTMLParser解析数据并根据需要进行处理。
如果您查看在表单操作中被调用的api,您将看到必须对此URL进行发布请求:
https://www.horseracebase.com/excelresults.php
具有以下参数:
data = {
"user": "41495", # looks like this varies with login, so update in case you change your login id
"racedate": "2005-3-15",
"downloadbutton": "Excel"
}
您可以执行以下操作:
response = raceSession.post(reqUrl, json=data)
如果这样不起作用,请尝试将标头添加到请求中,例如: headers=postHeaders
。 对于前。 在这种情况下,您应该在发送表单编码数据时设置内容类型标头,因此:
headers = {"Content-Type": "application/x-www-form-urlencoded"}
阅读此内容以获取有关如何将excel保存到文件的更多信息。
这是Postman中对此请求的响应,因此看起来您不需要content-type
之外的任何其他标头:
编辑
这是您需要做的:
raceSession = Session()
RaceUrl = 'https://www.horseracebase.com/excelresults.php'
RaceDataDetails = {"user": "41495", "racedate": "2005-3-15", "downloadbutton": "Excel"}
PostHeaders = {"Content-Type": "application/x-www-form-urlencoded"}
Response = raceSession.post(RaceUrl, data=RaceDataDetails, headers=PostHeaders)
# from StringIO import StringIO #for python 2.x
#import StringIO #for python 3.x
Table = pd.read_table(StringIO(Response.text))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.