简体   繁体   中英

How to use pandas to resample time series data

I was trying to resample time series data from 15 minutes to weekly. But it doesn't work out, I read the documentation and many relevant questions but do not understand.

My codes is as follow

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline
Wind = pd.read_csv('C:\WindData.csv')
Wind = Wind.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False) #Drop the nan
Wind.index = pd.to_datetime(Wind['Date'])
Wind_Weekly = Wind['Date'].resample('W').sum()

The raw data looks like this

Date    Actual   Forecast   Demand
0   01/01/2017 00:00 1049 1011.0 2922 
1   01/01/2017 00:15 961 1029.0 2892 
2   01/01/2017 00:30 924 1048.0 2858 
3   01/01/2017 00:45 852 1066.0 2745 

After resample, the data become like this

Date
2017-01-01    01/01/2017 00:0001/01/2017 00:1501/01/2017 00:...
2017-01-08    01/02/2017 00:0001/02/2017 00:1501/02/2017 00:...
2017-01-15    01/09/2017 00:0001/09/2017 00:1501/09/2017 00:...
2017-01-22    16/01/2017 00:0016/01/2017 00:1516/01/2017 00:...

I just want to sum the Actual, Forecast and Demand separately in weekly basis, do you know what I did wrong?

You're calling resample on a pd.Series containing only your Date variable as a string, so pandas is summing up the strings by concatenating them together in each row. Change this:

Wind_Weekly = Wind['Date'].resample('W').sum() 

To this:

Wind_Weekly = Wind.resample('W').sum()
# Next also works, and removes Date column from the resulting sum
Wind_Weekly = Wind.resample('W')['Actual', 'Forecast', 'Demand'].sum() 

Calling Wind['Date'] returns a pd.Series which ONLY contains your dates BEFORE being transformed to datetime . So no Actual , Forecast or Demand variables are actually passed to the resample call.

You can check that:

>>> type(Wind['Date'])
<class 'pandas.core.series.Series'>

For testing, I reproduced your problem with the following code:

import pandas as pd
import numpy as np

rng = pd.date_range('1/1/2012', periods=100, freq='D')
df = pd.DataFrame( # Construct df with a datetime index and some numbers
    {'ones': np.ones(100), 'twos': np.full(100, 2), 'zeros': np.zeros(100)}, 
    index=rng
)
df['Date'] = rng.astype(str) # re-add index as a str

In the interpreter:

>>> df.resample('W').sum() # works out of the box
            ones  twos  zeros
2012-01-01   1.0     2    0.0
2012-01-08   7.0    14    0.0
2012-01-15   7.0    14    0.0
...

>>> df['Date'].resample('W').sum() # same result as you, only resample 'Date' column
2012-01-01                                           2012-01-01
2012-01-08    2012-01-022012-01-032012-01-042012-01-052012-0...
2012-01-15    2012-01-092012-01-102012-01-112012-01-122012-0...
2012-01-22    2012-01-162012-01-172012-01-182012-01-192012-0...
...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM