在 pandas 列中，将重复数字替换为除第一个之外的 NAN 值

Question

I have a data frame like this,我有一个这样的数据框，

df
col1    col2
  1       A
  2       A
  3       B
  4       C
  5       C
  6       C
  7       B
  8       B
  9       A

Now we can see that there is continuous occurrence of A, B and C.现在我们可以看到A、B和C连续出现。 I want only the rows where the occurrence is starting.我只想要事件开始的行。 And the other values of the same occurrence will be nan.并且相同出现的其他值将是 nan。

The final data frame I am looking for will look like,我正在寻找的最终数据框看起来像，

df
col1    col2
  1       A
  2       NA
  3       B
  4       C
  5       NA
  6       NA
  7       B
  8       NA
  9       A

I can do it using for loop and comparing, But the execution time will be more.我可以使用 for 循环和比较来做到这一点，但执行时间会更长。 I am looking for pythonic way to do it.我正在寻找pythonic方式来做到这一点。 Some panda shortcuts may be.一些熊猫捷径可能是。

Answer 1

Compare by Series.shift ed values and missing values by Series.where or numpy.where :通过Series.shift ed 值和Series.where或numpy.where的缺失值进行比较：

df['col2'] = df['col2'].where(df['col2'].ne(df['col2'].shift()))
#alternative
#df['col2'] = np.where(df['col2'].ne(df['col2'].shift()), df['col2'], np.nan)

Or by DataFrame.loc with inverted condition by ~ :或通过DataFrame.loc与~反转条件：

df.loc[~df['col2'].ne(df['col2'].shift()), 'col2'] = np.nan

Or thanks @Daniel Mesejo - use eq for == :或者感谢@Daniel Mesejo - 将eq用于== ：

df.loc[df['col2'].eq(df['col2'].shift()), 'col2'] = np.nan

print (df)
   col1 col2
0     1    A
1     2  NaN
2     3    B
3     4    C
4     5  NaN
5     6  NaN
6     7    B
7     8  NaN
8     9    A

Detail :详情：

print (df['col2'].ne(df['col2'].shift()))
0     True
1    False
2     True
3     True
4    False
5    False
6     True
7    False
8     True
Name: col2, dtype: bool

在 pandas 列中，将重复数字替换为除第一个之外的 NAN 值

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-10-24 10:01:46

在 pandas 列中，将重复数字替换为除第一个之外的 NAN 值

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-10-24 10:01:46

解决方案1
1 已采纳 2019-10-24 10:01:46