当pandas中除两列以外的所有列的值匹配时，如何替换列的值？

Question

I have a dataframe that look like this:我有一个如下所示的数据框：

       iv_1  iv_2  iv_3  iv_4  iv_5  col2rplc  idenifier
0       0      0     0     0     0      a          1
333     0      0     0     0     0      b          0
      ......
222     1      2     3     4     5      aa         1
324     1      2     3     4     5      cc         0
      ......
1234    1      0     0     0     1      a          1
1235    0      2     0     4     0      a          1
1236    0      0     3     0     0      a          1
1237    0      0     1     0     0      b          o
1238    0      2     0     2     0      b          o
1239    3      0     0     0     3      b          o

This is two pandas dataframes concatenated.这是连接的两个熊猫数据框。 And identifier column identifies which set a particular row is from, set_1 or set_0.标识符列标识特定行来自哪个集合，set_1 或 set_0。 I would like to replace values of the column col2rplc in all the rows that have same values for all the columns of a set_0 with that of set_1.我想用col2rplc值替换col2rplc的所有列具有相同值的所有行中列col2rplc值。 So, in the above example, for the first two rows, I would like b to be replaced with a;因此，在上面的示例中，对于前两行，我希望将 b 替换为 a； and i would like cc to be replaced with aa;我希望将 cc 替换为 aa； while all the remaining rows of column col2rplc , where I don't have same values in rows, stay intact.而col2rplc列的所有剩余行（其中我的行中没有相同的值）保持不变。

How do I do this?我该怎么做呢？

Answer 1

Use duplicated to identify duplicates rows then mask and ffill :使用duplicated来识别重复行，然后mask和ffill ：

# sort the data accodringly
df = df.sort_values(['iv_1','iv_2','iv_3','iv_4','iv_5', 'idenifier'],
                    ascending=False)

mask = df.duplicated(df.columns[:5])
df['col2rplc'] = df['col2rplc'].mask(mask).ffill()

Output (notice you have an extra duplicate in the last few rows that you didn't mention in your question):输出（请注意，您在问题中未提及的最后几行中有一个额外的重复项）：

      iv_1  iv_2  iv_3  iv_4  iv_5 col2rplc  idenifier
0        0     0     0     0     0        a          1
222      1     2     3     4     5       aa          1
324      1     2     3     4     5       aa          0
333      0     0     0     0     0        a          0
1234     1     0     0     0     1        a          1
1235     0     2     0     2     0        a          1
1236     0     0     3     0     0        a          1
1237     0     0     1     0     0        b          0
1238     0     2     0     2     0        a          0
1239     3     0     0     0     3        b          0

当pandas中除两列以外的所有列的值匹配时，如何替换列的值？

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-11-18 16:58:55

当pandas中除两列以外的所有列的值匹配时，如何替换列的值？

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-11-18 16:58:55

解决方案1
1 已采纳 2020-11-18 16:58:55