简体   繁体   English

缺失值的插值,而不是 NA

[英]interpolation of missing values, not NA

i want to interpolate (Linear interpolation) data.我想插值(线性插值)数据。 but There is no NA.但没有NA。

Here is my data.with many missing values.这是我的数据。有许多缺失值。

timestamp时间戳 id ID strength力量
1383260400000 1383260400000 1 1 -0.3803901328171995 -0.3803901328171995
1383261000000 1383261000000 1 1 -0.42196042219455937 -0.42196042219455937
1383265200000 1383265200000 1 1 -0.460714706261982 -0.460714706261982

My expected :我的预期:

timestamp时间戳 id ID strength力量
1383260400000 1383260400000 1 1 -0.3803901328171995 -0.3803901328171995
1383261000000 1383261000000 1 1 -0.42196042219455937 -0.42196042219455937
1383261600000 1383261600000 1 1 Linear interpolated data线性插值数据
1383262200000 1383262200000 1 1 Linear interpolated data线性插值数据
1383262800000 1383262800000 1 1 Linear interpolated data线性插值数据
1383263400000 1383263400000 1 1 Linear interpolated data线性插值数据
1383264000000 1383264000000 1 1 Linear interpolated data线性插值数据
1383264600000 1383264600000 1 1 Linear interpolated data线性插值数据
1383265200000 1383265200000 1 1 -0.460714706261982 -0.460714706261982

timestamp starts 1383260400000, ends 1383343800000 and another id(from 1 to 2025) has same issues.时间戳从 1383260400000 开始,到 1383343800000 结束,另一个 id(从 1 到 2025)也有同样的问题。

Idea is create datetimes, convert to DatetimeIndex and in lambda function add missing datetimes by Series.asfreq with interpolate:想法是创建日期时间,转换为DatetimeIndex并在 lambda 函数中通过Series.asfreq添加缺少的日期时间,并进行插值:

df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')

f = lambda x: x.asfreq('10Min').interpolate()
df = df.set_index('timestamp').groupby('id')['strength'].apply(f).reset_index()
print (df)
   id           timestamp  strength
0   1 2013-10-31 23:00:00 -0.380390
1   1 2013-10-31 23:10:00 -0.421960
2   1 2013-10-31 23:20:00 -0.427497
3   1 2013-10-31 23:30:00 -0.433033
4   1 2013-10-31 23:40:00 -0.438569
5   1 2013-10-31 23:50:00 -0.444106
6   1 2013-11-01 00:00:00 -0.449642
7   1 2013-11-01 00:10:00 -0.455178
8   1 2013-11-01 00:20:00 -0.460715

Last if need original format of timestamps:最后如果需要原始格式的时间戳:

df['timestamp'] = df['timestamp'].astype(np.int64) // 1000000

print (df)
   id      timestamp  strength
0   1  1383260400000 -0.380390
1   1  1383261000000 -0.421960
2   1  1383261600000 -0.427497
3   1  1383262200000 -0.433033
4   1  1383262800000 -0.438569
5   1  1383263400000 -0.444106
6   1  1383264000000 -0.449642
7   1  1383264600000 -0.455178
8   1  1383265200000 -0.460715

EDIT:编辑:

#data from question
df =pd.DataFrame({'timestamp': [1383260400000, 1383261000000, 1383265200000], 
                  'id': [1, 1, 1], 
                  'strength':[-0.3803901328171995,-0.4219604221945593,-0.460714706261982]})
    
print (df)
       timestamp  id  strength
0  1383260400000   1 -0.380390
1  1383261000000   1 -0.421960
2  1383265200000   1 -0.460715

Solution create for each id all datetimes by date_range and create missing values by DataFrame.reindex with MultiIndex , last per id is used interpolate:解决方案通过date_range为每个id创建所有日期时间,并通过DataFrame.reindex使用MultiIndex创建缺失值,最后一个每个id用于插值:

df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')

r = pd.date_range(pd.to_datetime(1383260400000, unit='ms') , 
                  pd.to_datetime(1383343800000, unit='ms'), 
                  freq='10Min')

ids = df['id'].unique()

mux = pd.MultiIndex.from_product([r, ids], names=['timestamp','id'])
f = lambda x: x.interpolate()
df = (df.set_index(['timestamp', 'id'])
        .reindex(mux)
        .groupby('id')['strength']
        .transform(f)
        .reset_index())

print (df)
              timestamp  id  strength
0   2013-10-31 23:00:00   1 -0.380390
1   2013-10-31 23:10:00   1 -0.421960
2   2013-10-31 23:20:00   1 -0.427497
3   2013-10-31 23:30:00   1 -0.433033
4   2013-10-31 23:40:00   1 -0.438569
..                  ...  ..       ...
135 2013-11-01 21:30:00   1 -0.460715
136 2013-11-01 21:40:00   1 -0.460715
137 2013-11-01 21:50:00   1 -0.460715
138 2013-11-01 22:00:00   1 -0.460715
139 2013-11-01 22:10:00   1 -0.460715

[140 rows x 3 columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM