简体   繁体   中英

Iterate through multiple months to get different data

I need to go through each day from January 2015 up until February 05th, 2020.

The following script gets me the dates for each month up until feb 05 2020:

import pandas as pd
date = pd.datetime.now().strftime("%Y%m%d")


dates = pd.date_range(start='20150101', end='20200205', freq = "M").strftime("%Y%m%d")

print(dates)

Result:

Index(['20150131', '20150228', '20150331', '20150430', '20150531', '20150630',
       '20150731', '20150831', '20150930', '20151031', '20151130', '20151231',
       '20160131', '20160229', '20160331', '20160430', '20160531', '20160630',
       '20160731', '20160831', '20160930', '20161031', '20161130', '20161231',
       '20170131', '20170228', '20170331', '20170430', '20170531', '20170630',
       '20170731', '20170831', '20170930', '20171031', '20171130', '20171231',
       '20180131', '20180228', '20180331', '20180430', '20180531', '20180630',
       '20180731', '20180831', '20180930', '20181031', '20181130', '20181231',
       '20190131', '20190228', '20190331', '20190430', '20190531', '20190630',
       '20190731', '20190831', '20190930', '20191031', '20191130', '20191231',
       '20200131'],
      dtype='object'

The following script scrapes wind speed for each day in January 2015: In my main I specify API key, startdate and enddate which is used in the URL. I believe this is where the merge of the two scripts could take place.

import pandas as pd
import requests
import warnings

headers = {
    'scheme': 'https',
    'accept': 'application/json, text/plain, */*',
    'accept-encoding' : 'gzip, deflate, br',
    'accept-language': 'en-GB,en;q=0.9,en-US;q=0.8,da;q=0.7',
    'origin': 'https://www.wunderground.com',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'cross-site',
    'user-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
}

#Here I get the relevant data, being the dates and wind speed, and add it to a seperate dataframe called dkk
def get_data(response):
    df = response.json()
    df = pd.DataFrame(df["observations"])#[1]["valid_time_gmt", "wspd"]
    df["time"] = pd.to_datetime(df["valid_time_gmt"],unit='s')
    dkk = df.groupby(df["time"].dt.date)["wspd"].mean()
    return dkk 


if __name__ == "__main__":
    date = pd.datetime.now().strftime("%d-%m-%Y")

    api_key = "xxxxxx"
    start_date = "20150101"
    end_date = "20150131"

    urls = [
    "https://api.weather.com/v1/location/EGNV:9:GB/observations/historical.json?apiKey="+api_key+"&units=e&startDate="+start_date+"&endDate="+end_date+""
    ]

    #here I append data to dataframe and transpose it and store in df_transposed, which results in the 
    below. 
    df = pd.DataFrame()
    for url in urls:  
        warnings.simplefilter('ignore' ,InsecureRequestWarning)
        res = requests.get(url, headers= headers, verify = False)
        data = get_data(res)
        df = df.append(data) 
    df_transposed = df.T
    print(df_transposed)

Results:

                 wspd
2015-01-01  24.333333
2015-01-02  18.696970
...
2015-01-30  12.121212
2015-01-31  21.575758

The question is: I need to get the wind speed from January 01 2015 - February 05 2020. How can I best combine my scripts to get the desired output, which would be a two-column dataframe with dates in one and wind speed (wspd) in the second.

The desired output:

                 wspd
2015-01-01  24.333333
2015-01-02  18.696970
2015-01-03   8.454545
2015-01-04  10.363636
2015-01-05  11.333333
...
2020-02-04  13.5
2020-02-05  7.1

The wspd for the last two dates can be seen here:

https://www.wunderground.com/history/monthly/gb/darlington/EGNV/date/2020-2

Use Series.where :

s = df_transposed.index.to_series()
df_transposed= df_transposed.where((s >='2015-01-01') &(s<='2020-02-05'),'XXX')

EDIT

s = df_transposed.index.to_series()
df_transposed= df_transposed.where((s >=pd.to_datetime('2015-01-01')) &
                                   (s<=pd.to_datetime('2020-02-05')),'XXX')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM