[英]Replace repetitive number with NAN values except the first, in pandas column
I have a data frame like this,我有一个这样的数据框,
df
col1 col2
1 A
2 A
3 B
4 C
5 C
6 C
7 B
8 B
9 A
Now we can see that there is continuous occurrence of A, B and C.现在我们可以看到A、B和C连续出现。 I want only the rows where the occurrence is starting.
我只想要事件开始的行。 And the other values of the same occurrence will be nan.
并且相同出现的其他值将是 nan。
The final data frame I am looking for will look like,我正在寻找的最终数据框看起来像,
df
col1 col2
1 A
2 NA
3 B
4 C
5 NA
6 NA
7 B
8 NA
9 A
I can do it using for loop and comparing, But the execution time will be more.我可以使用 for 循环和比较来做到这一点,但执行时间会更长。 I am looking for pythonic way to do it.
我正在寻找pythonic方式来做到这一点。 Some panda shortcuts may be.
一些熊猫捷径可能是。
Compare by Series.shift
ed values and missing values by Series.where
or numpy.where
:通过
Series.shift
ed 值和Series.where
或numpy.where
的缺失值进行比较:
df['col2'] = df['col2'].where(df['col2'].ne(df['col2'].shift()))
#alternative
#df['col2'] = np.where(df['col2'].ne(df['col2'].shift()), df['col2'], np.nan)
Or by DataFrame.loc
with inverted condition by ~
:或通过
DataFrame.loc
与~
反转条件:
df.loc[~df['col2'].ne(df['col2'].shift()), 'col2'] = np.nan
Or thanks @Daniel Mesejo - use eq
for ==
:或者感谢@Daniel Mesejo - 将
eq
用于==
:
df.loc[df['col2'].eq(df['col2'].shift()), 'col2'] = np.nan
print (df)
col1 col2
0 1 A
1 2 NaN
2 3 B
3 4 C
4 5 NaN
5 6 NaN
6 7 B
7 8 NaN
8 9 A
Detail :详情:
print (df['col2'].ne(df['col2'].shift()))
0 True
1 False
2 True
3 True
4 False
5 False
6 True
7 False
8 True
Name: col2, dtype: bool
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.