简体   繁体   English

python 数据帧中的多个条件

[英]multiple condition in python data frame

My data frame looks like -我的数据框看起来像 -

id          code    
1            AA
2            BB
3            CC
4            AA
5            GG
6            BB
7            NN
8            YY

My final output looks like -我最终的 output 看起来像 -

id          code         group  
1            AA            A
2            BB            B
3            CC            A
4            AA            A
5            GG            G
6            BB            B
7            NN            other
8            YY            G

My code looks like -我的代码看起来像 -

col         = 'code'
conditions  = [ (df[col] == 'AA' & df[col] == 'CC'), (df[col] == 'GG' & df[col] == 'YY'), df[col] == 'BB' ]
choices     = [ 'A', 'G', 'B' ]

df["group"] = np.select(conditions, choices, default='other')

But code column is in huge category, around 40. Some of the category belongs to A, some are B, some are G and rest of the category belongs to other.但是代码列是巨大的类别,大约40个。一些类别属于A,一些属于B,一些属于G,类别的rest属于其他。 I think, I need to create a list for each category in condition section, then we can implement.我想,我需要在条件部分为每个类别创建一个列表,然后我们才能实现。 Otherwise its very difficult to do using above code.否则使用上面的代码很难做到。

Use Series.map with dictioanry and then replace non matched values by default value by Series.fillna :Series.map与字典一起使用,然后将不匹配的值替换为默认值Series.fillna

d = {'AA':'A','CC':'A','GG':'G','YY':'G','BB':'B'}

df["group"] = df[col].map(d).fillna('other')

If format of dictionary is different first is necessary change format like solution above:如果字典的格式不同,首先需要更改格式,如上述解决方案:

d1 = {'A': ['AA','CC'], 'G':['GG','YY'], 'B':['BB']}

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d = {k: oldk for oldk, oldv in d1.items() for k in oldv}
df["group"] = df[col].map(d).fillna('other')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM