[英]How to change consecutive repeating values in pandas dataframe series to nan or 0?
I have a pandas dataframe created from measured numbers.我有一个根据测量数字创建的熊猫数据框。 When something goes wrong with the measurement, the last value is repeated.
当测量出现问题时,会重复最后一个值。 I would like to do two things:
我想做两件事:
1. Change all repeating values either to nan or 0. 1. 将所有重复值更改为 nan 或 0。
2. Keep the first repeating value and change all other values nan or 0. 2. 保留第一个重复值并将所有其他值更改为 nan 或 0。
I have found solutions using "shift" but they drop repeating values.我找到了使用“shift”的解决方案,但它们会删除重复值。 I do not want to drop repeating values.My data frame looks like this:
我不想删除重复值。我的数据框如下所示:
df = pd.DataFrame(np.random.randn(15, 3))
df.iloc[4:8,0]=40
df.iloc[12:15,1]=22
df.iloc[10:12,2]=0.23
giving a dataframe like this:给出这样的数据框:
0 1 2
0 1.239916 1.109434 0.305490
1 0.248682 1.472628 0.630074
2 -0.028584 -1.116208 0.074299
3 -0.784692 -0.774261 -1.117499
4 40.000000 0.283084 -1.495734
5 40.000000 -0.074763 -0.840403
6 40.000000 0.709794 -1.000048
7 40.000000 0.920943 0.681230
8 -0.701831 0.547689 -0.128996
9 -0.455691 0.610016 0.420240
10 -0.856768 -1.039719 0.230000
11 1.187208 0.964340 0.230000
12 0.116258 22.000000 1.119744
13 -0.501180 22.000000 0.558941
14 0.551586 22.000000 -0.993749
what I would like to be able to do is write some code that would filter the data and give me a data frame like this:我希望能够做的是编写一些代码来过滤数据并给我一个这样的数据框:
0 1 2
0 1.239916 1.109434 0.305490
1 0.248682 1.472628 0.630074
2 -0.028584 -1.116208 0.074299
3 -0.784692 -0.774261 -1.117499
4 NaN 0.283084 -1.495734
5 NaN -0.074763 -0.840403
6 NaN 0.709794 -1.000048
7 NaN 0.920943 0.681230
8 -0.701831 0.547689 -0.128996
9 -0.455691 0.610016 0.420240
10 -0.856768 -1.039719 NaN
11 1.187208 0.964340 NaN
12 0.116258 NaN 1.119744
13 -0.501180 NaN 0.558941
14 0.551586 NaN -0.993749
or even better keep the first value and change the rest to NaN.或者甚至更好地保留第一个值并将其余值更改为 NaN。 Like this:
像这样:
0 1 2
0 1.239916 1.109434 0.305490
1 0.248682 1.472628 0.630074
2 -0.028584 -1.116208 0.074299
3 -0.784692 -0.774261 -1.117499
4 40.000000 0.283084 -1.495734
5 NaN -0.074763 -0.840403
6 NaN 0.709794 -1.000048
7 NaN 0.920943 0.681230
8 -0.701831 0.547689 -0.128996
9 -0.455691 0.610016 0.420240
10 -0.856768 -1.039719 0.230000
11 1.187208 0.964340 NaN
12 0.116258 22.000000 1.119744
13 -0.501180 NaN 0.558941
14 0.551586 NaN -0.993749
using shift & mask:使用移位和掩码:
df.shift(1) == df
compares the next row to the current for consecutive duplicates. df.shift(1) == df
将下一行与当前的连续重复进行比较。
df.mask(df.shift(1) == df)
# outputs
0 1 2
0 0.365329 0.153527 0.143244
1 0.688364 0.495755 1.065965
2 0.354180 -0.023518 3.338483
3 -0.106851 0.296802 -0.594785
4 40.000000 0.149378 1.507316
5 NaN -1.312952 0.225137
6 NaN -0.242527 -1.731890
7 NaN 0.798908 0.654434
8 2.226980 -1.117809 -1.172430
9 -1.228234 -3.129854 -1.101965
10 0.393293 1.682098 0.230000
11 -0.029907 -0.502333 NaN
12 0.107994 22.000000 0.354902
13 -0.478481 NaN 0.531017
14 -1.517769 NaN 1.552974
if you want to remove all the consecutive duplicates, test that the previous row is also the same as the current row如果要删除所有连续的重复项,请测试前一行是否也与当前行相同
df.mask((df.shift(1) == df) | (df.shift(-1) == df))
Option 1选项 1
Specialized solution using diff
.使用
diff
专业解决方案。 Get's at the final desired output.获得最终所需的输出。
df.mask(df.diff().eq(0))
0 1 2
0 1.239916 1.109434 0.305490
1 0.248682 1.472628 0.630074
2 -0.028584 -1.116208 0.074299
3 -0.784692 -0.774261 -1.117499
4 40.000000 0.283084 -1.495734
5 NaN -0.074763 -0.840403
6 NaN 0.709794 -1.000048
7 NaN 0.920943 0.681230
8 -0.701831 0.547689 -0.128996
9 -0.455691 0.610016 0.420240
10 -0.856768 -1.039719 0.230000
11 1.187208 0.964340 NaN
12 0.116258 22.000000 1.119744
13 -0.501180 NaN 0.558941
14 0.551586 NaN -0.993749
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.