[英]Remove pandas row that is based on previous row
我有以下 dataframe,其值应该会增加。 原来 dataframe 有一些未知值。
指数 | 价值 |
---|---|
0 | 1 |
1 | |
2 | |
3 | 2 |
4 | |
5 | |
6 | |
7 | 4 |
8 | |
9 | |
10 | 3 |
11 | 3 |
12 | |
13 | |
14 | |
15 | 5 |
基于该值应该增加的假设,我想删除索引 10 和 11 处的值。这将是所需的 dataframe:
指数 | 价值 |
---|---|
0 | 1 |
1 | |
2 | |
3 | 2 |
4 | |
5 | |
6 | |
7 | 4 |
8 | |
9 | |
12 | |
13 | |
14 | |
15 | 5 |
非常感谢
假设空单元格中有 NaN(如果没有,暂时用 NaN 替换它们),使用 boolean 索引:
# if not NaNs uncomment below
# and use s in place of df['value'] afterwards
# s = pd.to_numeric(df['value'], errors='coerce')
# is the cell empty?
m1 = df['value'].isna()
# are the values strictly increasing?
m2 = df['value'].ge(df['value'].cummax())
out = df[m1|m2]
Output:
index value
1 1 NaN
2 2 NaN
3 3 2.0
4 4 NaN
5 5 NaN
6 6 NaN
7 7 4.0
8 8 NaN
9 9 NaN
12 12 NaN
13 13 NaN
14 14 NaN
15 15 5.0
def del_df(df):
df_no_na = df.dropna().reset_index(drop = True)
num_tmp = df_no_na['value'][0] # First value which is not NaN.
del_index_list = [] # indicies to delete
for row_index in range(1, len(df_no_na)):
if df_no_na['value'][row_index] > num_tmp : #Increasing
num_tmp = df_no_na['value'][row_index] # to compare following two values.
else : # Not increasing(same or decreasing)
del_index_list.append(df_no_na['index'][row_index]) # index to delete
df_goal = df.drop([df.index[i] for i in del_index_list])
return df_goal
output:
index value
0 0 1.0
1 1 NaN
2 2 NaN
3 3 2.0
4 4 NaN
5 5 NaN
6 6 NaN
7 7 4.0
8 8 NaN
9 9 NaN
12 12 NaN
13 13 NaN
14 14 NaN
15 15 5.0
我希望它能满足你的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.