[英]How to replace values of a column when values of all columns except two match in pandas?
I have a dataframe that look like this:我有一个如下所示的数据框:
iv_1 iv_2 iv_3 iv_4 iv_5 col2rplc idenifier
0 0 0 0 0 0 a 1
333 0 0 0 0 0 b 0
......
222 1 2 3 4 5 aa 1
324 1 2 3 4 5 cc 0
......
1234 1 0 0 0 1 a 1
1235 0 2 0 4 0 a 1
1236 0 0 3 0 0 a 1
1237 0 0 1 0 0 b o
1238 0 2 0 2 0 b o
1239 3 0 0 0 3 b o
This is two pandas dataframes concatenated.这是连接的两个熊猫数据框。 And identifier column identifies which set a particular row is from, set_1 or set_0.标识符列标识特定行来自哪个集合,set_1 或 set_0。 I would like to replace values of the column col2rplc
in all the rows that have same values for all the columns of a set_0 with that of set_1.我想用col2rplc
值替换col2rplc
的所有列具有相同值的所有行中列col2rplc
值。 So, in the above example, for the first two rows, I would like b to be replaced with a;因此,在上面的示例中,对于前两行,我希望将 b 替换为 a; and i would like cc to be replaced with aa;我希望将 cc 替换为 aa; while all the remaining rows of column col2rplc
, where I don't have same values in rows, stay intact.而col2rplc
列的所有剩余行(其中我的行中没有相同的值)保持不变。
How do I do this?我该怎么做呢?
Use duplicated
to identify duplicates rows then mask
and ffill
:使用duplicated
来识别重复行,然后mask
和ffill
:
# sort the data accodringly
df = df.sort_values(['iv_1','iv_2','iv_3','iv_4','iv_5', 'idenifier'],
ascending=False)
mask = df.duplicated(df.columns[:5])
df['col2rplc'] = df['col2rplc'].mask(mask).ffill()
Output (notice you have an extra duplicate in the last few rows that you didn't mention in your question):输出(请注意,您在问题中未提及的最后几行中有一个额外的重复项):
iv_1 iv_2 iv_3 iv_4 iv_5 col2rplc idenifier
0 0 0 0 0 0 a 1
222 1 2 3 4 5 aa 1
324 1 2 3 4 5 aa 0
333 0 0 0 0 0 a 0
1234 1 0 0 0 1 a 1
1235 0 2 0 2 0 a 1
1236 0 0 3 0 0 a 1
1237 0 0 1 0 0 b 0
1238 0 2 0 2 0 a 0
1239 3 0 0 0 3 b 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.