简体   繁体   English

Pandas 根据多个条件替换行

[英]Pandas replace rows based on multiple conditions

Take for example the following dataframe:以下面的 dataframe 为例:

df = pd.DataFrame({"val":np.random.rand(8),
                   "id1":[1,2,3,4,1,2,3,4],
                   "id2":[1,2,1,2,2,1,2,2],
                   "id3":[1,1,1,1,2,2,2,2]})

I would like to replace the id2 rows where id3 does not equal an arbitrary reference with the corresponding id2 values which have the same id1我想用具有相同 id1 的相应 id2 值替换 id3 不等于任意引用的 id2 行

I have a solution which partially works but does not operate using the 2nd condition (replcae id2 based on same values as id1 when id3 is equal to the reference).我有一个部分有效但不使用第二个条件的解决方案(当 id3 等于参考时,基于与 id1 相同的值替换 id2)。 This prevents my solution from being very robust, as discussed below.这会阻止我的解决方案变得非常健壮,如下所述。

import pandas as pd
import numpy as np

df = pd.DataFrame({"val":np.random.rand(8),
                   "id1":[1,2,3,4,1,2,3,4],
                   "id2":[1,2,1,2,2,1,2,2],
                   "id3":[1,1,1,1,2,2,2,2]})

reference = 1
df.loc[df['id3'] != reference, "id2"] = df[df["id3"]==reference]["id2"].values
print(df)

Output: Output:

        val  id1  id2  id3
0  0.580965    1    1    1
1  0.941297    2    2    1
2  0.001142    3    1    1
3  0.479363    4    2    1
4  0.732861    1    1    2
5  0.650075    2    2    2
6  0.776919    3    1    2
7  0.377657    4    2    2

This solution does work, but only under the condition that id3 has two distinct values.此解决方案确实有效,但前提是 id3 具有两个不同的值。 If there are three id3 values, ie如果有三个id3值,即

df = pd.DataFrame({"val":np.random.rand(12),
                   "id1":[1,2,3,4,1,2,3,4,1,2,3,4],
                   "id2":[1,2,1,2,2,1,2,2,1,1,2,2],
                   "id3":[1,1,1,1,2,2,2,2,3,3,3,3]})

Expected/desired output:预期/期望 output:

         val  id1  id2  id3
0   0.800934    1    1    1
1   0.505645    2    2    1
2   0.268300    3    1    1
3   0.295300    4    2    1
4   0.564372    1    1    2
5   0.154572    2    2    2
6   0.591691    3    1    2
7   0.896055    4    2    2
8   0.275267    1    1    3
9   0.840533    2    2    3
10  0.192257    3    1    3
11  0.543342    4    2    3

Then unfortunately my solution ceases to work.然后不幸的是我的解决方案停止工作。 If anyone could provide some tips how to circumvent this issue, I would be very appreciative.如果有人可以提供一些如何规避此问题的提示,我将不胜感激。

If id1 column is like counter of groups create helper Series by reference group by filtering and DataFrame.set_index first and then use Series.map :如果id1列类似于组的计数器,则首先通过过滤和DataFrame.set_index通过reference组创建辅助Series ,然后使用Series.map

reference = 1
s = df[df['id3'] == reference].set_index('id1')['id2']
df['id2'] = df['id1'].map(s)
print (df)
         val  id1  id2  id3
0   0.986277    1    1    1
1   0.873392    2    2    1
2   0.509746    3    1    1
3   0.271836    4    2    1
4   0.336919    1    1    2
5   0.216954    2    2    2
6   0.276477    3    1    2
7   0.343316    4    2    2
8   0.862159    1    1    3
9   0.156700    2    2    3
10  0.140887    3    1    3
11  0.757080    4    2    3

If not counter column create new one by GroupBy.cumcount :如果不是计数器列,则通过GroupBy.cumcount创建新的:

reference = 1

df['g'] = df.groupby('id3').cumcount()
s = df[df['id3'] == reference].set_index('g')['id2']
df['id2'] = df['g'].map(s)
print (df)
         val  id1  id2  id3  g
0   0.986277    1    1    1  0
1   0.873392    2    2    1  1
2   0.509746    3    1    1  2
3   0.271836    4    2    1  3
4   0.336919    1    1    2  0
5   0.216954    2    2    2  1
6   0.276477    3    1    2  2
7   0.343316    4    2    2  3
8   0.862159    1    1    3  0
9   0.156700    2    2    3  1
10  0.140887    3    1    3  2
11  0.757080    4    2    3  3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM