简体   繁体   English

Pandas使用新长度插入数据帧

[英]Pandas Interpolate dataframe with new length

I have a dataframe with columns of Datetime, lat, lon, z. 我有一个包含Datetime,lat,lon,z列的数据框。 I am reading the data in from a csv file so setting the period for the datetimes do not work. 我正在从csv文件中读取数据,因此设置日期时间段不起作用。 The times are in 6 hour intervals but I want to linearly interpolate the data to hourly intervals. 时间间隔为6小时,但我希望按小时间隔线性插入数据。

Go from 从...来

       'A'              'B'    'C'   'D'
0   2010-09-13 18:00:00 16.3 -78.5    1
1   2010-09-14 00:00:00 16.6 -79.8    6 
2   2010-09-14 06:00:00 17.0 -81.1    12

To

       'A'              'B'    'C'   'D'
1   2010-09-13 18:00:00 16.3  -78.5   1      
2   2010-09-13 19:00:00 16.35 -78.7   2
3   2010-09-13 20:00:00 16.4  -78.9   3
4   2010-09-13 21:00:00 16.45 -79.1   4
5   2010-09-13 22:00:00 16.5  -79.3   5
....

I have tried using the interpolate command but there are no arguments for a new length of the dataframe. 我尝试过使用interpolate命令,但是对于新的数据帧长度没有参数。

df.interpolate(method='linear')

I was thinking that I could use .loc to include 5 rows of NANs between each line in the data frame and then use the interpolation function but that seems like a bad workaround. 我想我可以使用.loc在数据帧​​的每一行之间包含5行NAN,然后使用插值函数,但这似乎是一个糟糕的解决方法。

Solution Using DatetimeIndex eliminates the association with the other columns if your initial column was not imported as datetime. 解决方案如果未将初始列导入为datetime,则使用DatetimeIndex可以消除与其他列的关联。

i = pd.DatetimeIndex(start=df['A'].min(), end=df['A'].max(),    freq='H')
df = df.reindex(i).interpolate()
print(df)

Gives the correct answer. 给出正确的答案。

i = pd.DatetimeIndex(start=df.index.min(), end=df.index.max(), freq='H')
df = df.reindex(i).interpolate()
print(df)

outputs 输出

2010-09-13 18:00:00  16.300000 -78.500000
2010-09-13 19:00:00  16.350000 -78.716667
2010-09-13 20:00:00  16.400000 -78.933333
2010-09-13 21:00:00  16.450000 -79.150000
2010-09-13 22:00:00  16.500000 -79.366667
  1. Create a new index with the desired frequency using DatetimeIndex ( docs ). 使用DatetimeIndexdocs )创建具有所需频率的新索引。

  2. reindex ( docs ) with this new index. 使用这个新索引reindexdocs )。 By default values for new indices will be np.nan . 默认情况下,新索引的值为np.nan

  3. interpolate ( docs ) to fill in these missing values. interpolatedocs )以填充这些缺失值。 You can supply the method kwarg to determine how interpolation is done. 您可以提供method kwarg来确定插值的完成方式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM