簡體   English   中英

Python:如何從頁面下載Excel文件

[英]Python: How to download Excel file from page

  1. 轉到此URL https://www.horseracebase.com/horse-racing-results.php?year=2005&month=3&day=15 (用戶名= TrickyBen |密碼= TrickyBen123)
  2. 請注意,有一個“下載Excel”按鈕(紅色)
  3. 我想下載Excel文件並將其轉換為熊貓數據框。 我想以編程方式進行此操作(即從腳本中進行操作,而不是通過手動單擊網站來進行操作)。 我該怎么做?

此代碼將使您以TrickyBen登錄,並向網站API發出請求...

從lxml導入請求從請求導入html從會話導入會話導入熊貓作為pd導入關閉

raceSession = Session()

LoginDetails = {'login': 'TrickyBen', 'password': 'TrickyBen123'}

LoginUrl = 'https://www.horseracebase.com/horse-racing-results.php?year=2005&month=3&day=15/horsebase1.php'
LoginPost = raceSession.post(LoginUrl, data=LoginDetails)

RaceUrl = 'https://www.horseracebase.com/excelresults.php'
RaceDataDetails =  {"user": "41495", "racedate": "2005-3-15", "downloadbutton": "Excel"}

PostHeaders = {"Content-Type": "application/x-www-form-urlencoded"}
Response = raceSession.post(RaceUrl, data=RaceDataDetails, headers=PostHeaders)

Table = pd.read_table(Response.text)

Table.to_csv('blahblah.csv')

如果檢查元素,您會注意到相關元素看起來像這樣...

<form action="excelresults.php" method="post">
    <input type="hidden" name="user" value="41495">
    <input type="hidden" name="racedate" value="2005-3-15">
    <input type="submit" class="downloadbutton" value="Excel">
</form>

我收到此錯誤消息...

Traceback (most recent call last):
  File "/Users/Alex/Desktop/DateTest/hrpull.py", line 20, in <module>
    Table = pd.read_table(Response.text)
  File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 562, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 315, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 645, in __init__
    self._make_engine(self.engine)
  File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 799, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
  File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 1213, in __init__
self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 358, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3427)
  File "pandas/parser.pyx", line 628, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:6861)
IOError: File race_date race_time   track   race_name       race_restrictions_age   race_class  major   race_distance   prize_money     going_description   number_of_runners   place   distbt  horse_name  stall       trainer horse_age   jockey_name jockeys_claim   pounds  odds    fav     official_rating comptime    TotalDstBt  MedianOR    Dist_Furlongs       placing_numerical   RCode   BFSP    BFSP_Place  PlcsPaid    BFPlcsPaid      Yards   RailMove    RaceType    
"2005-03-15"    "14:00:00"  "Cheltenham"    "Letheby & Christopher Supreme Novices Hurdle " "4yo+"  "Class 1"   "Grade 1"   "2m˝f " "58000" "Good"  "20"    "1st"       "Arcalis"   "0" "Johnson, J Howard" "5" "Lee, G"    "0" "161"   "21"        "136"   "3 mins 53.00s"     "121.5" "16.5"  "1" "National Hunt" "0" "0" "3" "0" "0" "0" "Novices Hurdle"
"2005-03-15"    "14:00:00"  "Cheltenham"    "Letheby & Christopher Supreme Novices Hurdle " "4yo+"  "Class 1"   "Grade 1"   "2m˝f " "58000" "Good"  "20"    "2nd"   "6" "Wild Passion (GER)"    "0" "Meade, Noel"   "5" "Carberry, P"   "0" "161"   "11"        "0" "3 mins 53.00s" "6" "121.5" "16.5"  "2" "National Hunt" "0" "0" "3" "0" "0" "0" "Novices Hurdle"

我認為您可以在另一個網頁上看到要下載的數據,例如,通過單擊“我的系統(v4)”。 如果可以這樣做,則可以使用urllib.request.urlretrieve下載該頁面。 然后,您可以使用html.parser.HTMLParser解析數據並根據需要進行處理。

如果您查看在表單操作中被調用的api,您將看到必須對此URL進行發布請求:

https://www.horseracebase.com/excelresults.php

具有以下參數:

data = {
    "user": "41495", # looks like this varies with login, so update in case you change your login id
    "racedate": "2005-3-15",
    "downloadbutton": "Excel"
}

您可以執行以下操作:

response = raceSession.post(reqUrl, json=data)

如果這樣不起作用,請嘗試將標頭添加到請求中,例如: headers=postHeaders 對於前。 在這種情況下,您應該在發送表單編碼數據時設置內容類型標頭,因此:

headers = {"Content-Type": "application/x-www-form-urlencoded"} 

閱讀此內容以獲取有關如何將excel保存到文件的更多信息。

這是Postman中對此請求的響應,因此看起來您不需要content-type之外的任何其他標頭:

在此處輸入圖片說明

編輯

這是您需要做的:

raceSession = Session()

RaceUrl = 'https://www.horseracebase.com/excelresults.php'
RaceDataDetails =  {"user": "41495", "racedate": "2005-3-15", "downloadbutton": "Excel"}

PostHeaders = {"Content-Type": "application/x-www-form-urlencoded"}
Response = raceSession.post(RaceUrl, data=RaceDataDetails, headers=PostHeaders)
# from StringIO import StringIO #for python 2.x
#import StringIO #for python 3.x
Table = pd.read_table(StringIO(Response.text)) 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM