缺失值的插值，而不是 NA

Question

i want to interpolate (Linear interpolation) data.我想插值（线性插值）数据。 but There is no NA.但没有NA。

Here is my data.with many missing values.这是我的数据。有许多缺失值。

timestamp时间戳	id ID	strength力量
1383260400000 1383260400000	1 1	-0.3803901328171995 -0.3803901328171995
1383261000000 1383261000000	1 1	-0.42196042219455937 -0.42196042219455937
1383265200000 1383265200000	1 1	-0.460714706261982 -0.460714706261982

My expected :我的预期：

timestamp时间戳	id ID	strength力量
1383260400000 1383260400000	1 1	-0.3803901328171995 -0.3803901328171995
1383261000000 1383261000000	1 1	-0.42196042219455937 -0.42196042219455937
1383261600000 1383261600000	1 1	Linear interpolated data线性插值数据
1383262200000 1383262200000	1 1	Linear interpolated data线性插值数据
1383262800000 1383262800000	1 1	Linear interpolated data线性插值数据
1383263400000 1383263400000	1 1	Linear interpolated data线性插值数据
1383264000000 1383264000000	1 1	Linear interpolated data线性插值数据
1383264600000 1383264600000	1 1	Linear interpolated data线性插值数据
1383265200000 1383265200000	1 1	-0.460714706261982 -0.460714706261982

timestamp starts 1383260400000, ends 1383343800000 and another id(from 1 to 2025) has same issues.时间戳从 1383260400000 开始，到 1383343800000 结束，另一个 id（从 1 到 2025）也有同样的问题。

Answer 1

Idea is create datetimes, convert to DatetimeIndex and in lambda function add missing datetimes by Series.asfreq with interpolate:想法是创建日期时间，转换为DatetimeIndex并在 lambda 函数中通过Series.asfreq添加缺少的日期时间，并进行插值：

df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')

f = lambda x: x.asfreq('10Min').interpolate()
df = df.set_index('timestamp').groupby('id')['strength'].apply(f).reset_index()
print (df)
   id           timestamp  strength
0   1 2013-10-31 23:00:00 -0.380390
1   1 2013-10-31 23:10:00 -0.421960
2   1 2013-10-31 23:20:00 -0.427497
3   1 2013-10-31 23:30:00 -0.433033
4   1 2013-10-31 23:40:00 -0.438569
5   1 2013-10-31 23:50:00 -0.444106
6   1 2013-11-01 00:00:00 -0.449642
7   1 2013-11-01 00:10:00 -0.455178
8   1 2013-11-01 00:20:00 -0.460715

Last if need original format of timestamps:最后如果需要原始格式的时间戳：

df['timestamp'] = df['timestamp'].astype(np.int64) // 1000000

print (df)
   id      timestamp  strength
0   1  1383260400000 -0.380390
1   1  1383261000000 -0.421960
2   1  1383261600000 -0.427497
3   1  1383262200000 -0.433033
4   1  1383262800000 -0.438569
5   1  1383263400000 -0.444106
6   1  1383264000000 -0.449642
7   1  1383264600000 -0.455178
8   1  1383265200000 -0.460715

EDIT:编辑：

#data from question
df =pd.DataFrame({'timestamp': [1383260400000, 1383261000000, 1383265200000], 
                  'id': [1, 1, 1], 
                  'strength':[-0.3803901328171995,-0.4219604221945593,-0.460714706261982]})
    
print (df)
       timestamp  id  strength
0  1383260400000   1 -0.380390
1  1383261000000   1 -0.421960
2  1383265200000   1 -0.460715

Solution create for each id all datetimes by date_range and create missing values by DataFrame.reindex with MultiIndex , last per id is used interpolate:解决方案通过date_range为每个id创建所有日期时间，并通过DataFrame.reindex使用MultiIndex创建缺失值，最后一个每个id用于插值：

df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')

r = pd.date_range(pd.to_datetime(1383260400000, unit='ms') , 
                  pd.to_datetime(1383343800000, unit='ms'), 
                  freq='10Min')

ids = df['id'].unique()

mux = pd.MultiIndex.from_product([r, ids], names=['timestamp','id'])
f = lambda x: x.interpolate()
df = (df.set_index(['timestamp', 'id'])
        .reindex(mux)
        .groupby('id')['strength']
        .transform(f)
        .reset_index())

print (df)
              timestamp  id  strength
0   2013-10-31 23:00:00   1 -0.380390
1   2013-10-31 23:10:00   1 -0.421960
2   2013-10-31 23:20:00   1 -0.427497
3   2013-10-31 23:30:00   1 -0.433033
4   2013-10-31 23:40:00   1 -0.438569
..                  ...  ..       ...
135 2013-11-01 21:30:00   1 -0.460715
136 2013-11-01 21:40:00   1 -0.460715
137 2013-11-01 21:50:00   1 -0.460715
138 2013-11-01 22:00:00   1 -0.460715
139 2013-11-01 22:10:00   1 -0.460715

[140 rows x 3 columns]

缺失值的插值，而不是 NA

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-05-10 11:40:53

缺失值的插值，而不是 NA

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-05-10 11:40:53

解决方案1
0 已采纳 2022-05-10 11:40:53