簡體   English   中英

在Python中下載CSV文件

[英]Downloading CSV files in Python

所以我正在嘗試使用下面的代碼下載股票數據

from urllib import request

#Download all daily stock data
for firm in ["SONC"]:
  for year in ["2009", "2010", "2011", "2012", "2013", "2014", "2015"]:
    for month in ["01","02","03","04","05","06","07","08","09","10","11","12"]:
      # Retrieve the webpage as a string
      response = request.urlopen("https://www.quandl.com/api/v1/datasets/WIKI/"+firm+".csv?trim_start="+year+"-"+month+"-01&trim_end="+year+"-"+month+"-31&collapse=daily")
      csv = response.read()

      # Save the string to a file
      csvstr = str(csv).strip("b'")

      lines = csvstr.split("\\n")
      f = open(""+firm+"_"+year+""+month+".csv", "w")
      for line in lines:
        f.write(line + "\n")
      f.close()

但是我遇到了問題。 也就是說,它僅適用於一次迭代(因此,如果我只有一家公司,一年零一個月,它會起作用),但不適用於多個

以下是我收到的錯誤消息

Traceback (most recent call last):
  File "C:/Users/kdaftari/Desktop/ECON431_Program.py", line 8, in <module>
    response = request.urlopen("https://www.quandl.com/api/v1/datasets/WIKI/"+firm+".csv?trim_start="+year+"-"+month+"-01&trim_end="+year+"-"+month+"-31&collapse=daily")
  File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python34\lib\urllib\request.py", line 469, in open
    response = meth(req, response)
  File "C:\Python34\lib\urllib\request.py", line 579, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python34\lib\urllib\request.py", line 507, in error
    return self._call_chain(*args)
  File "C:\Python34\lib\urllib\request.py", line 441, in _call_chain
    result = func(*args)
  File "C:\Python34\lib\urllib\request.py", line 587, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 422: Unprocessable Entity

您發送的日期無效; 服務器告訴您2月31日不存在:

$ curl -D - -s "http://www.quandl.com/api/v1/datasets/WIKI/SONC.csv?trim_start=2009-02-01&trim_end=2009-02-31&collapse=daily"
HTTP/1.1 422 Unprocessable Entity
Cache-Control: no-cache
Content-Disposition: filename=WIKI-SONC.csv
Content-Type: text/csv
Date: Sat, 07 Mar 2015 22:28:59 GMT
Server: nginx
Status: 422 Unprocessable Entity
X-RateLimit-Limit: 50
X-RateLimit-Remaining: 38
X-Request-Id: b5d774b5-e916-40ef-92c4-443ceccf2ba6
X-Runtime: 0.025214
Content-Length: 117
Connection: keep-alive

error
trim_end:You provided 2009-02-31 for trim_end. This is not a recognized date format. Please provide yyyy-mm-dd

請注意正文中的錯誤消息。

您可以輕松地使用datetime.date()對象生成正確的日期:

from datetime import date, timedelta

for firm in ["SONC"]:
    for year in range(2009, 2016):
        for month in range(1, 13):
            startdate = date(year, month, 1)
            enddate = date(year + (month // 12), month % 12 + 1, 1) - timedelta(days=1)
            url = 'http://www.quandl.com/api/v1/datasets/WIKI/{}.csv?trim_start={:%Y-%m-%d}&trim_end={:%Y-%m-%d}&collapse=daily'.format(
                firm, startdate, enddate)

您嘗試使用urllib.request下載 URL,並且Web服務器響應出現錯誤422 Unprocessable Entity

另外,如果看到服務器響應,則該服務器將錯誤描述為:

error
trim_end:You provided 2009-02-31 for trim_end. This is not a recognized date format. Please provide yyyy-mm-dd

根據Martijn Pieters的建議:2009-02-31是錯誤的日期。

在這里,我為您修復了代碼:

import calendar
import time
from urllib import request, error as urllib_error

#Download all daily stock data
for firm in ["SONC"]:
    for year in range(2009, 2016): # from 2009 to 2015 inclusive
        for month in range(1, 13):   # from 1 to 12 inclusive
            # Get number of days in month
            days_in_month = calendar.monthrange(year, month)[1]

            # Retrieve the webpage as a string
            url = "https://www.quandl.com/api/v1/datasets/WIKI/{firm}.csv" \
                "?trim_start={year}-{month}-01&trim_end={year}-{month}-{days_in_month}" \
                "&collapse=daily".format(
                    firm=firm, year=year, month=month, days_in_month=days_in_month)

            # For easier debugging
            print(url)

            sleep_time = 1
            while True:
                try:
                    response = request.urlopen(url)
                    csv = response.read()
                except urllib_error.HTTPError as ex:
                    if ex.code == 429:  # Too Many Requests
                        print("Server replied with 'Too many requests', sleeping for a second...")
                        time.sleep(sleep_time)

                        # Increase sleep time so that retries doesn't overload server
                        sleep_time = min(2 * sleep_time, 60)

            # Save the string to a file
            file_name = "{firm}_{year}_{month}.csv".format(
                firm=firm, year=year, month=month)
            with open(file_name, "wb") as f:
                f.write(csv)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM