简体   繁体   中英

Time series resample seems to result in wrong data

I have data with 30 minutes interval. When I resample it to 1 hour I get kind of low values.

Original data:

2022-12-31 22:00:00+01:00;7.500000
2022-12-31 22:30:00+01:00;8.200000
2022-12-31 23:00:00+01:00;10.800000
2022-12-31 23:30:00+01:00;9.500000
2023-01-01 00:00:00+01:00;12.300000
2023-01-01 00:30:00+01:00;168.399994
2023-01-01 01:00:00+01:00;157.399994
2023-01-01 01:30:00+01:00;73.199997
2023-01-01 02:00:00+01:00;59.700001
2023-01-01 02:30:00+01:00;74.000000

After df = df.resample('h', label='right')mean() I get:

2022-12-31 23:00:00+01:00;7.850000
2023-01-01 00:00:00+01:00;10.150000
2023-01-01 01:00:00+01:00;90.349997
2023-01-01 02:00:00+01:00;15.299995
2023-01-01 03:00:00+01:00;66.850000

Should the value for 01:00:00 not be 162.89 ?

I think you are confusing label and closed parameters. If you want to get 162.89 , you have to use closed='right' :

>>> df.resample('H', closed='right').mean()
2022-12-31 21:00:00+01:00      7.500000
2022-12-31 22:00:00+01:00      9.500000
2022-12-31 23:00:00+01:00     10.900000
2023-01-01 00:00:00+01:00    162.899994  # right value but for 00:00
2023-01-01 01:00:00+01:00     66.449999
2023-01-01 02:00:00+01:00     74.000000
Freq: H, dtype: float64

>>> df.resample('H', closed='right', label='right').mean()
2022-12-31 22:00:00+01:00      7.500000
2022-12-31 23:00:00+01:00      9.500000
2023-01-01 00:00:00+01:00     10.900000
2023-01-01 01:00:00+01:00    162.899994  # right value for 01:00
2023-01-01 02:00:00+01:00     66.449999
2023-01-01 03:00:00+01:00     74.000000
Freq: H, dtype: float64

label control the display (index) while closed control the values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM