简体   繁体   中英

Resample time series data in dictionary with python

I have daily price history data stored in a dictionary in the following format:

test = 
{
datetime(2020, 1, 15): 15.99,
datetime(2020, 1, 16): 18.99,
datetime(2020, 1, 17): 20.99,
datetime(2020, 1, 18): 14.99
.......
}

I am able to plot this data with:

x = list(test.keys())
y = list(test.values())
plt.plot(x,y)

But I want to resample my data to monthly basis. How can I do that?

Is this what you're after? You can convert your dict to a df with a datetime index, and resample that way, aggregating by sum. Had to use datetime.datetime() rather than just datetime() from your example.

test = {
datetime.datetime(2020, 1, 15): 15.99,
datetime.datetime(2020, 1, 16): 18.99,
datetime.datetime(2020, 1, 17): 20.99,
datetime.datetime(2020, 1, 18): 14.99,
datetime.datetime(2020, 2, 18): 17.99,
datetime.datetime(2020, 2, 19): 21.99

}

# make a df and transpose it with .T
df = pd.DataFrame(test, index=[0]).T

# rename column 0 so it's more descriptive
df.columns = ['monthly_price_sum']

# resample and choose to aggregate values by sum, but could use max, min, mean, etc.
df = df.resample('M').sum()

print(df)


            monthly_price_sum
2020-01-31  70.96
2020-02-29  39.98

If you want to get back to a dict, you can do it with this. Zip is nice here to avoid extraneous nesting within new_dict.

new_dict = dict(zip(df.index, df['monthly_price_sum]))

print(new_dict)

{Timestamp('2020-01-31 00:00:00', freq='M'): 70.96, Timestamp('2020-02-29 00:00:00', freq='M'): 39.98}

You can plot the outputs like so.

df.plot(xlabel='month', ylabel='monthly price sum')

在此处输入图像描述

You can do this with pandas.resample .

Generate sample data:

import pandas as pd
from datetime import datetime
timestamps = [datetime(2020, 1, 27), datetime(2020, 1, 28), datetime(2020, 2, 1), datetime(2020, 2, 2), datetime(2020, 3, 5), datetime(2020, 3, 6), datetime(2020, 4, 17), datetime(2020, 4, 18)]
values = [15.99, 18.99, 20.99, 14.99, 25.99, 48.99, 10.99, 34.99]

Load sample data with timestamps as index:

df = pd.DataFrame({ 'timestamp': timestamps, 'value': values })
df = df.set_index('timestamp')
df.plot(legend=False, xlabel='Date', ylabel='Value')

原始时间序列

Resample by month:

df.resample('M').mean().plot(legend=False, xlabel='Month', ylabel='Value')

重采样时间序列

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM