[英]Pandas - compare 2 columns and choose value based on Priority
下面是我的输入 dataframe
df = pd.DataFrame({'Level_DB': ['Level 1 Experienced' ,'Level 2 Expert', 'Level 1 Experienced', 'Level 2 Expert', 'Level 3 Thought Leader', 'Level 1 Experienced', 'Non-Certified', 'Level 3 Thought Leader', 'Certified', 'Certified', np.nan, 'Level 1 Experienced'],
'Level_Legacy' :[ 'Certified', 'Level 1 Experienced', 'Level 3 Thought Leader', 'Level 3 Thought Leader Recert', 'Level 3 Thought Leader Recert', 'Non-Certified', 'non-certified', 'Level 2 Expert Recert', 'Level 1 Experienced', 'Non-Certified', 'Certified', '']})
并且,在比较输入列“Level_DB”和“Level_Legacy”并选择最高优先级值后,应生成目标列:“Output”。 优先列表如下
priority_List = ['Level 3 Thought Leader', 'Level 3 Thought Leader New', 'Level 3 Thought Leader Recert', 'Level 3 Thought Leader Recert Lapsed',
'Level 2 Expert', 'Level 2 Expert New', 'Level 2 Expert Recert', 'Level 2 Expert Recert Lapsed',
'Level 1 Experienced', 'Level 1 Experienced New', 'Level 1 Experienced Recert', 'Level 1 Experienced Recert Lapsed', 'Certified', 'Non-Certified' , 'non-certified']
预期的最终 DataFrame 与所需的“输出”列如下
一开始没有什么能打动我的大脑。 请帮忙
想法是创建有序分类,通过DataFrame.stack
重塑,因此 output 是每个level=0
的max
:
from pandas.api.types import CategoricalDtype
cat_type = CategoricalDtype(categories=priority_List[::-1],ordered=True)
#solution if more columns in data
#df['Output'] = df[['Level_DB','Level_Legacy']].stack().astype(cat_type).max(level=0)
df['Output'] = df.stack().astype(cat_type).max(level=0)
print (df)
Level_DB Level_Legacy \
0 Level 1 Experienced Certified
1 Level 2 Expert Level 1 Experienced
2 Level 1 Experienced Level 3 Thought Leader
3 Level 2 Expert Level 3 Thought Leader Recert
4 Level 3 Thought Leader Level 3 Thought Leader Recert
5 Level 1 Experienced Non-Certified
6 Non-Certified non-certified
7 Level 3 Thought Leader Level 2 Expert Recert
8 Certified Level 1 Experienced
9 Certified Non-Certified
10 NaN Certified
11 Level 1 Experienced
Output
0 Level 1 Experienced
1 Level 2 Expert
2 Level 3 Thought Leader
3 Level 3 Thought Leader Recert
4 Level 3 Thought Leader
5 Level 1 Experienced
6 Non-Certified
7 Level 3 Thought Leader
8 Level 1 Experienced
9 Certified
10 Certified
11 Level 1 Experienced
我们可以在这里使用Series.map
通过enumerating
您的priority_list
并获得顺序最高的最低索引:
dct_priority = {j:i for i, j in enumerate(priority_List)}
dct_priority_reverse = {i:j for i, j in enumerate(priority_List)}
df['Output'] = df.apply(lambda x: x.map(dct_priority)).min(axis=1).map(dct_priority_reverse)
Level_DB Level_Legacy Output
0 Level 1 Experienced Certified Level 1 Experienced
1 Level 2 Expert Level 1 Experienced Level 2 Expert
2 Level 1 Experienced Level 3 Thought Leader Level 3 Thought Leader
3 Level 2 Expert Level 3 Thought Leader Recert Level 3 Thought Leader Recert
4 Level 3 Thought Leader Level 3 Thought Leader Recert Level 3 Thought Leader
5 Level 1 Experienced Non-Certified Level 1 Experienced
6 Non-Certified non-certified Non-Certified
7 Level 3 Thought Leader Level 2 Expert Recert Level 3 Thought Leader
8 Certified Level 1 Experienced Level 1 Experienced
9 Certified Non-Certified Certified
10 NaN Certified Certified
11 Level 1 Experienced Level 1 Experienced
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.