I was trying to resample time series data from 15 minutes to weekly. But it doesn't work out, I read the documentation and many relevant questions but do not understand.
My codes is as follow
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline
Wind = pd.read_csv('C:\WindData.csv')
Wind = Wind.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False) #Drop the nan
Wind.index = pd.to_datetime(Wind['Date'])
Wind_Weekly = Wind['Date'].resample('W').sum()
The raw data looks like this
Date Actual Forecast Demand
0 01/01/2017 00:00 1049 1011.0 2922
1 01/01/2017 00:15 961 1029.0 2892
2 01/01/2017 00:30 924 1048.0 2858
3 01/01/2017 00:45 852 1066.0 2745
After resample, the data become like this
Date
2017-01-01 01/01/2017 00:0001/01/2017 00:1501/01/2017 00:...
2017-01-08 01/02/2017 00:0001/02/2017 00:1501/02/2017 00:...
2017-01-15 01/09/2017 00:0001/09/2017 00:1501/09/2017 00:...
2017-01-22 16/01/2017 00:0016/01/2017 00:1516/01/2017 00:...
I just want to sum the Actual, Forecast and Demand separately in weekly basis, do you know what I did wrong?
You're calling resample
on a pd.Series
containing only your Date
variable as a string, so pandas is summing up the strings by concatenating them together in each row. Change this:
Wind_Weekly = Wind['Date'].resample('W').sum()
To this:
Wind_Weekly = Wind.resample('W').sum()
# Next also works, and removes Date column from the resulting sum
Wind_Weekly = Wind.resample('W')['Actual', 'Forecast', 'Demand'].sum()
Calling Wind['Date']
returns a pd.Series which ONLY contains your dates BEFORE being transformed to datetime
. So no Actual
, Forecast
or Demand
variables are actually passed to the resample
call.
You can check that:
>>> type(Wind['Date'])
<class 'pandas.core.series.Series'>
For testing, I reproduced your problem with the following code:
import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2012', periods=100, freq='D')
df = pd.DataFrame( # Construct df with a datetime index and some numbers
{'ones': np.ones(100), 'twos': np.full(100, 2), 'zeros': np.zeros(100)},
index=rng
)
df['Date'] = rng.astype(str) # re-add index as a str
In the interpreter:
>>> df.resample('W').sum() # works out of the box
ones twos zeros
2012-01-01 1.0 2 0.0
2012-01-08 7.0 14 0.0
2012-01-15 7.0 14 0.0
...
>>> df['Date'].resample('W').sum() # same result as you, only resample 'Date' column
2012-01-01 2012-01-01
2012-01-08 2012-01-022012-01-032012-01-042012-01-052012-0...
2012-01-15 2012-01-092012-01-102012-01-112012-01-122012-0...
2012-01-22 2012-01-162012-01-172012-01-182012-01-192012-0...
...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.