![](/img/trans.png)
[英]Forward Fill Pandas Dataframe Horizontally (along rows) without forward filling last value in each row
[英]Forward fill pandas dataframe without duplicating values in rows
我有以下数据框,所有空白区域均为np.nan。
coupler_id 25 26 28 29
timestamp
2015-12-05 03:02:29 12017.0 12008.0
2015-12-05 03:04:47 12017.0 12008.0
2015-12-05 03:09:14 12017.0 12008.0
2015-12-05 03:12:12 12017.0 12008.0
2015-12-05 03:23:06 12008.0
2015-12-05 03:24:45 12017.0
2015-12-05 06:31:20 12017.0
2015-12-05 09:36:29 12011.0
2015-12-05 23:59:35 12017.0
2015-12-06 23:59:38 12017.0
我想向前填充缺失的值(限制1), 而不必在rows中复制值 。 因此,以上数据框应如下所示:
coupler_id 25 26 28 29
timestamp
2015-12-05 03:02:29 12017.0 12008.0
2015-12-05 03:04:47 12017.0 12008.0
2015-12-05 03:09:14 12017.0 12008.0
2015-12-05 03:12:12 12017.0 12008.0
2015-12-05 03:23:06 12017.0 12008.0
2015-12-05 03:24:45 12017.0
2015-12-05 06:31:20 12017.0
2015-12-05 09:36:29 12011.0
2015-12-05 23:59:35 12011.0 12017.0
2015-12-06 23:59:38 12017.0
编辑:
如果第25列和第26列中有数据,而第28列索引2015-12-05 03:24:45之前没有nan,该怎么办。
coupler_id 25 26 28 29
timestamp
2015-12-05 03:02:29 12017.0 12008.0
2015-12-05 03:04:47 12017.0 12008.0
2015-12-05 03:09:14 12017.0 12008.0
2015-12-05 03:12:12 12017.0 12008.0
2015-12-05 03:23:06 12007.0 12018.0 12008.0
2015-12-05 03:24:45 12033.0 12050.0 12025.0 12017.0
2015-12-05 06:31:20 12033.0 12017.0
2015-12-05 09:36:29 12008.0 12011.0
2015-12-05 23:59:35 12017.0
2015-12-06 23:59:38 12017.0
更新的答案
这是检查所有列的更一般的情况:
def remove_duplicates(data, ix, names):
# if only 1 entry, no comparison needed
if data.notnull().sum() == 1:
return data
# mark all duplicates
dupes = data.dropna().duplicated(keep=False)
if dupes.any():
for name in names:
# if previous value was NaN AND current is duplicate, replace with NaN
if np.isnan(df.loc[ix, name]) & dupes[name]:
data[name] = np.nan
return data
filled = df.ffill(limit=1)
filled.apply(lambda row: remove_duplicates(row, row.name, row.index), axis=1)
25 26 28 29
2015-12-05 03:02:29 NaN NaN 12017.0 12008.0
2015-12-05 03:04:47 NaN NaN 12017.0 12008.0
2015-12-05 03:09:14 NaN NaN 12017.0 12008.0
2015-12-05 03:12:12 NaN NaN 12017.0 12008.0
2015-12-05 03:23:06 12007.0 12018.0 12017.0 12008.0
2015-12-05 03:24:45 12033.0 12050.0 12025.0 12017.0
2015-12-05 06:31:20 NaN 12033.0 12017.0 NaN
2015-12-05 09:36:29 12008.0 12033.0 12011.0 NaN
2015-12-05 23:59:35 12008.0 NaN 12011.0 12017.0
2015-12-06 23:59:38 NaN NaN NaN 12017.0
先前的答案
您可以使用ffill(limit=1)
,然后检查是否存在重复项, 并且前面的列之一是否为NaN
。
import numpy as np
def remove_duplicates(data, ix, names):
if data[0] - data[1] != 0:
return data
if np.isnan(filled.loc[ix-1, names[0]]):
return [data[0], np.nan]
elif np.isnan(filled.loc[ix-1, names[1]]):
return [np.nan, data[1]]
return data
filled = df[["28","29"]].ffill(limit=1)
df[["28","29"]] = filled.apply(
lambda row: remove_duplicates(row, row.name, row.index), axis=1
)
df
coupler_id 25 26 28 29
0 2015-12-05 03:02:29 NaN NaN 12017.0 12008.0
1 2015-12-05 03:04:47 NaN NaN 12017.0 12008.0
2 2015-12-05 03:09:14 NaN NaN 12017.0 12008.0
3 2015-12-05 03:12:12 NaN NaN 12017.0 12008.0
4 2015-12-05 03:23:06 NaN NaN 12017.0 12008.0
5 2015-12-05 03:24:45 NaN NaN NaN 12017.0
6 2015-12-05 06:31:20 NaN NaN 12017.0 NaN
7 2015-12-05 09:36:29 NaN NaN 12011.0 NaN
8 2015-12-05 23:59:35 NaN NaN 12011.0 12017.0
9 2015-12-06 23:59:38 NaN NaN NaN 12017.0
根据文档, ffill是DataFrame.fillna(method ='ffill')的同义词,因此在ffill上使用限制arg将限制填充的数量。
df = df.ffill(limit=1)
示例: temp Out[224]: XYZ 0 0.0 0.0 0.0 1 1.0 2.0 2.0 2 NaN NaN NaN 3 NaN 3.0 3.0 4 1.0 NaN NaN 5 NaN NaN NaN temp.ffill(limit=1) Out[225]: XYZ 0 0.0 0.0 0.0 1 1.0 2.0 2.0 2 1.0 2.0 2.0 3 NaN 3.0 3.0 4 1.0 3.0 3.0 5 1.0 NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.