简体   繁体   English

Python:使用 ffill() 将数据帧从每日数据上采样到每小时数据

[英]Python: upsampling dataframe from daily to hourly data using ffill()

I'm trying to upsample my data from daily to hourly frequency and forward fill missing data.我正在尝试将我的数据从每日频率上采样到每小时频率并向前填充缺失的数据。

I start with the following code:我从以下代码开始:

df1 = pd.read_csv("DATA.csv")   
df1.head(5)

标题

I then used the following to convert to a datetime string and set the date/time as an index:然后我使用以下内容转换为日期时间字符串并将日期/时间设置为索引:

df1['DT'] = pd.to_datetime(df1['DT']).dt.strftime('%Y-%m-%d %H:%M:%S')
df1.set_index('DT')

在此处输入图片说明

I try to resample hourly as follows:我尝试每小时重新采样如下:

df1['DT'] = df1.resample('H').ffill()

But I get the following error:但我收到以下错误:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'类型错误:仅对 DatetimeIndex、TimedeltaIndex 或 PeriodIndex 有效,但得到了“RangeIndex”的实例

I thought my dtype was already date time as instructed by the pd.to_datetime code above.我认为我的 dtype 已经是上面 pd.to_datetime 代码指示的日期时间。 Nothing I try seems to be working.我尝试的任何东西似乎都不起作用。 Can anyone please help me?谁能帮帮我吗?

My expected output is as follows:我的预期输出如下:

DT                  VALUE
2016-08-01 00:00:00 0.000000
2016-08-01 01:00:00 0.000000
2016-08-01 02:00:00 0.000000

etc.等等。

The file itself has approximately 1000 rows.文件本身大约有 1000 行。 The first 50 rows or so are zero so to clarify where there's actual data:前 50 行左右为零,以便阐明实际数据的位置:

DT                  VALUE
2018-12-13 00:00:00 24000.000000
2018-12-13 01:00:00 24000.000000
2018-12-13 02:00:00 24000.000000
...
2018-12-13 23:00:00 24000.000000
2018-12-14 00:00:00 26000.000000
2018-12-14 01:00:00 26000.000000

etc.等等。

Try assign it back尝试将其分配回来

df1=df1.set_index('DT')

Or或者

df1.set_index('DT',inplace=True)

I am assuming some initial rows of your dataset as you mentioned,我假设你提到的数据集的一些初始行,

          DT    VALUE
0   2016-08-01  0
1   2016-08-02  0
2   2016-08-03  0
3   2016-08-04  0
4   2016-08-05  0
5   2016-08-06  0
6   2016-08-07  0
7   2016-08-08  0
8   2016-08-09  0

Then, make index on DT like this,然后,像这样在DT索引,

df = df.set_index('DT')
df

Output:输出:

           VALUE
   DT   
2016-08-01  0
2016-08-02  0
2016-08-03  0
2016-08-04  0
2016-08-05  0
2016-08-06  0
2016-08-07  0
2016-08-08  0
2016-08-09  0

Now, resample your dataframe,现在,重新采样您的数据框,

df = df.resample('H').ffill()
df

Output: showing some initial values of output,输出:显示输出的一些初始值,

                VALUE
    DT  
2016-08-01 00:00:00 0
2016-08-01 01:00:00 0
2016-08-01 02:00:00 0
2016-08-01 03:00:00 0
2016-08-01 04:00:00 0
2016-08-01 05:00:00 0
2016-08-01 06:00:00 0
2016-08-01 07:00:00 0
2016-08-01 08:00:00 0
2016-08-01 09:00:00 0
2016-08-01 10:00:00 0

You could convert the index to a pd.DatetimeIndex and then resample that.您可以将索引转换为pd.DatetimeIndex然后重新采样。 I also don't think you need (or want) the strftime() call:我也不认为你需要(或想要) strftime()调用:

df1 = pd.read_csv("DATA.csv")
df1['DT'] = pd.to_datetime(df1['DT'])
df1.set_index('DT')
df1.index = pd.DatetimeIndex(df1.index)
df1['DT'] = df1.resample('H').ffill()

NOTE: You could probably combine a bunch of this and it would still be quite clear, like:注意:您可能可以结合一堆这样的内容,它仍然会很清楚,例如:

df1 = pd.read_csv("DATA.csv")
df1.index = pd.DatetimeIndex(pd.to_datetime(df1['DT']))
df1['DT'] = df1.resample('H').ffill()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将每日多索引数据上采样到 Pandas 中的每小时样本 - Upsampling daily multi indexed data to hourly samples in Pandas 每小时数据到每日数据python - Hourly data to daily data python 将每日数据重新采样为每小时数据框并复制内容 - Resample daily data to hourly dataframe and copy contents 熊猫-使用相应列中的值将数据框的频率从每天更改为每小时 - Pandas - Changing the frequency of a dataframe from daily to hourly, using the values from the corresponding columns 在 Pandas 中将每小时数据上采样到 5 分钟数据 - Upsampling hourly data to 5 minute data in pandas Python - 如何将 dataframe 中的每日值与字典中的每小时百分比相乘以获得 dataframe 和每小时值 - Python - how to multiply daily values in dataframe with hourly percentages in a dictionary to get dataframe with hourly values 将Pandas datetimeindex的频率从每天更改为每小时,以根据每日重采样数据的条件选择每小时数据 - Change the frequency of a Pandas datetimeindex from daily to hourly, to select hourly data based on a condition on daily resampled data 如何将 Pandas DataFrame 日期索引从每日更改为每小时 - How to change Pandas DataFrame date index from daily to hourly 如何从 Python 中的每小时数据 netcdf 文件中找到最高日温度? - How to find maximum daily temperature from hourly data netcdf file in Python? Python Map Reduce 从每小时数据中查找每个气象站的每日最高、最低、平均和温度变化 - Python Map Reduce to find daily max, min, mean and variance in temperature for each weather station from hourly data
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM