简体   繁体   English

如何将熊猫数据帧系列中的连续重复值更改为 nan 或 0?

[英]How to change consecutive repeating values in pandas dataframe series to nan or 0?

I have a pandas dataframe created from measured numbers.我有一个根据测量数字创建的熊猫数据框。 When something goes wrong with the measurement, the last value is repeated.当测量出现问题时,会重复最后一个值。 I would like to do two things:我想做两件事:
1. Change all repeating values either to nan or 0. 1. 将所有重复值更改为 nan 或 0。
2. Keep the first repeating value and change all other values nan or 0. 2. 保留第一个重复值并将所有其他值更改为 nan 或 0。

I have found solutions using "shift" but they drop repeating values.我找到了使用“shift”的解决方案,但它们会删除重复值。 I do not want to drop repeating values.My data frame looks like this:我不想删除重复值。我的数据框如下所示:

df = pd.DataFrame(np.random.randn(15, 3))
df.iloc[4:8,0]=40
df.iloc[12:15,1]=22
df.iloc[10:12,2]=0.23

giving a dataframe like this:给出这样的数据框:

            0          1         2
0    1.239916   1.109434  0.305490
1    0.248682   1.472628  0.630074
2   -0.028584  -1.116208  0.074299
3   -0.784692  -0.774261 -1.117499
4   40.000000   0.283084 -1.495734
5   40.000000  -0.074763 -0.840403
6   40.000000   0.709794 -1.000048
7   40.000000   0.920943  0.681230
8   -0.701831   0.547689 -0.128996
9   -0.455691   0.610016  0.420240
10  -0.856768  -1.039719  0.230000
11   1.187208   0.964340  0.230000
12   0.116258  22.000000  1.119744
13  -0.501180  22.000000  0.558941
14   0.551586  22.000000 -0.993749

what I would like to be able to do is write some code that would filter the data and give me a data frame like this:我希望能够做的是编写一些代码来过滤数据并给我一个这样的数据框:

           0         1         2
0   1.239916  1.109434  0.305490
1   0.248682  1.472628  0.630074
2  -0.028584 -1.116208  0.074299
3  -0.784692 -0.774261 -1.117499
4        NaN  0.283084 -1.495734
5        NaN -0.074763 -0.840403
6        NaN  0.709794 -1.000048
7        NaN  0.920943  0.681230
8  -0.701831  0.547689 -0.128996
9  -0.455691  0.610016  0.420240
10 -0.856768 -1.039719       NaN
11  1.187208  0.964340       NaN
12  0.116258       NaN  1.119744
13 -0.501180       NaN  0.558941
14  0.551586       NaN -0.993749

or even better keep the first value and change the rest to NaN.或者甚至更好地保留第一个值并将其余值更改为 NaN。 Like this:像这样:

            0          1         2
0    1.239916   1.109434  0.305490
1    0.248682   1.472628  0.630074
2   -0.028584  -1.116208  0.074299
3   -0.784692  -0.774261 -1.117499
4   40.000000   0.283084 -1.495734
5         NaN  -0.074763 -0.840403
6         NaN   0.709794 -1.000048
7         NaN   0.920943  0.681230
8   -0.701831   0.547689 -0.128996
9   -0.455691   0.610016  0.420240
10  -0.856768  -1.039719  0.230000
11   1.187208   0.964340       NaN
12   0.116258  22.000000  1.119744
13  -0.501180        NaN  0.558941
14   0.551586        NaN -0.993749

using shift & mask:使用移位和掩码:

df.shift(1) == df compares the next row to the current for consecutive duplicates. df.shift(1) == df将下一行与当前的连续重复进行比较。

df.mask(df.shift(1) == df)

# outputs    
            0          1         2
0    0.365329   0.153527  0.143244
1    0.688364   0.495755  1.065965
2    0.354180  -0.023518  3.338483
3   -0.106851   0.296802 -0.594785
4   40.000000   0.149378  1.507316
5         NaN  -1.312952  0.225137
6         NaN  -0.242527 -1.731890
7         NaN   0.798908  0.654434
8    2.226980  -1.117809 -1.172430
9   -1.228234  -3.129854 -1.101965
10   0.393293   1.682098  0.230000
11  -0.029907  -0.502333       NaN
12   0.107994  22.000000  0.354902
13  -0.478481        NaN  0.531017
14  -1.517769        NaN  1.552974

if you want to remove all the consecutive duplicates, test that the previous row is also the same as the current row如果要删除所有连续的重复项,请测试前一行是否也与当前行相同

df.mask((df.shift(1) == df) | (df.shift(-1) == df))

Option 1选项 1
Specialized solution using diff .使用diff专业解决方案。 Get's at the final desired output.获得最终所需的输出。

df.mask(df.diff().eq(0))

            0          1         2
0    1.239916   1.109434  0.305490
1    0.248682   1.472628  0.630074
2   -0.028584  -1.116208  0.074299
3   -0.784692  -0.774261 -1.117499
4   40.000000   0.283084 -1.495734
5         NaN  -0.074763 -0.840403
6         NaN   0.709794 -1.000048
7         NaN   0.920943  0.681230
8   -0.701831   0.547689 -0.128996
9   -0.455691   0.610016  0.420240
10  -0.856768  -1.039719  0.230000
11   1.187208   0.964340       NaN
12   0.116258  22.000000  1.119744
13  -0.501180        NaN  0.558941
14   0.551586        NaN -0.993749

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM