简体   繁体   English

循环遍历数据范围以从 API 下载数据

[英]Looping through a data range to download data from an API

I am downloading data from an API, the maximum look back period is 833 days, but from my testing I know they have data going back to 2002. I have a function below that defines 833 days from today into the two date times "end" and "start", these are fed into an API command.我正在从 API 下载数据,最长回溯期为 833 天,但从我的测试中我知道他们的数据可以追溯到 2002 年。我在下面有一个函数,它定义了从今天到两个日期时间“结束”的 833 天和“开始”,这些被输入到一个 API 命令中。 Note they need to be in a string format and formatted in that way for the api to accept them.请注意,它们需要采用字符串格式并以这种方式格式化,以便 api 接受它们。

d=datetime.today()
end = str(d.year) + "-" + str(d.month) + "-" + str(d.day)

lookbook_period = 833

# Take current date and minus the the max time delta of 833 days to get 'start' var
time_delta = timedelta(days=lookbook_period)
now = datetime.now()

#split the answer and format it to the required timestamp type.
start = str(now - time_delta).split(" ")[0]

What I want to do is download the dataframes in 833 day sections and then piece them all together into a CSV or dataframe.我想要做的是下载 833 天部分的数据帧,然后将它们拼凑成一个 CSV 或数据帧。 I have below what I have so far but I am not sure how to make a function that changes the dates as it goes.到目前为止,我有以下内容,但我不确定如何制作一个可以随时更改日期的函数。


def time_machine():
    df_total = pd.DataFrame
    
    start_str = str(2002) + "-0" + str(5) + "-0" + str(1)
    start = datetime(2002,5,1)
    print(start)
    
    # amount of days from 2002-05-01 to now
    rolling_td = timedelta(days=int(str((datetime.today() - start)).split(" ")[0]))
    print(rolling_td, "\n")
    
    # API maximum amount of lookbook days
    max_td = timedelta(days=833)
    
    # The function would do something similar to this, and on each pass, calling the API and saving the data to a dataframe or CSV.


    s1 = start + max_td
    print(s1)
    s2 = s1 + max_td
    print(s2)
    s3 = s2 + max_td
    print(s3)
    
    d=datetime.today()
    end = str(d.year) + "-" + str(d.month) + "-" + str(d.day)
    print(d)

Any suggestions or tools/libraries to look at would be greatly appreciated.任何建议或工具/库查看将不胜感激。 I have been testing things with while-loops but I am still running blindly into the dark on this one.我一直在用 while 循环测试东西,但我仍然盲目地在这个循环中陷入困境。

This a rough sudo code of what I think I need, but Im still unsure on how to move onto the next section这是我认为我需要的粗略 sudo 代码,但我仍然不确定如何进入下一部分

while count > 0 and > 833:
        start = 
        end =
        
        
    call the API first to download first set of data. 
    Check date range:
        get most recent date + 833 days to it 
            Download next section
                repeat
    
    if count < 833: 
            calulate requied dates for start and end 

If you first define your date ranges, you will be able to iterate through each 833 day period to pull data using the API.如果您首先定义日期范围,您将能够在每 833 天的时间段内迭代以使用 API 提取数据。 You'll then need to append the data to the dataframe (or csv) for each iteration.然后,您需要为每次迭代将数据附加到数据框(或 csv)。

import datetime as dt

# Date range to pull data over
start_date = dt.date(2002,5,1)
end_date = dt.date.today()
delta = dt.timedelta(days=832) # 832 so you have a range of 833 days inclusive

# Iterating from start date, recording date ranges of 833 days
date_ranges = []
temp_start_date = start_date
while temp_start_date < end_date:
    temp_end_date = temp_start_date + delta 
    if temp_end_date > end_date:
        temp_end_date = end_date
    date_ranges.append([temp_start_date, temp_end_date])
    temp_start_date = temp_end_date + dt.timedelta(days=1)

# For each date range, pass dates into API
# Initialise dataframe here
for start_date, end_date in date_ranges:
    start_date_str = start_date.strftime("%Y-%m-%d")
    end_date_str = end_date.strftime("%Y-%m-%d")

    # Input to API with start and end dates in correct string format

    # Process data into dataframe

There should be no need to count 833 days, as you said the API takes the start and end dates as arguments, hence you just need to find those for each date range.不需要计算 833 天,正如您所说,API 将开始日期和结束日期作为参数,因此您只需要为每个日期范围找到它们。

You can use pandas date_range to get your dates every 833 days and just iterate through.您可以使用 pandas date_range 每 833 天获取一次日期并进行迭代。 Also, your date format looks to be isoformat.此外,您的日期格式看起来是 isoformat。

Put your code inside the for loop to get the data into the dataframe, then output to csv.将您的代码放在 for 循环中以将数据放入数据帧,然后输出到 csv。

dates_list = pd.date_range(start=datetime.now(), periods=10, freq='-833D')

for i in range(1, len(dates_list)):
    # print(dates_list[i], dates_list[i-1])
    start = dates_list[i].date().isoformat()
    end = dates_list[i-1].date().isoformat()
    print(start, end)

2019-07-15 2021-10-25
2017-04-03 2019-07-15
2014-12-22 2017-04-03
2012-09-10 2014-12-22
2010-05-31 2012-09-10
2008-02-18 2010-05-31
2005-11-07 2008-02-18
2003-07-28 2005-11-07
2001-04-16 2003-07-28

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM