简体   繁体   English

如何通过比较两个数据帧的唯一ID来创建新列?

[英]how to make a new column based on comparing two dataframes' unique id?

数据

Hi, I have two dataframes which have OrderID and stage numbers. 嗨,我有两个具有OrderID和阶段号的数据框。 I want to create a new column for August Dataframe which returns values in July's stage number if the orderID matches. 我想为August Dataframe创建一个新列,如果orderID匹配,它将返回7月的阶段号中的值。 If not, please return "N/A". 如果没有,请返回“ N / A”。

How should I use lambda and apply functions to create this column? 我应该如何使用lambda并应用函数来创建此列? (don't use join...) (不要使用join ...)

Any clues and suggestions will be appreciated! 任何线索和建议,将不胜感激! Thanks! 谢谢!

You can use pd.Series.map with a series. 您可以将pd.Series.map与序列一起使用。 Note if you have NaN values your series will be forced to float as NaN is a float value. 请注意,如果您具有NaN值,则您的系列将被强制float因为NaN是浮动值。 This is unavoidable without adding inefficiencies. 在不增加效率的情况下这是不可避免的。

aug = pd.DataFrame({'ID': [111, 222, 333, 444, 555], 'Prior': np.nan})
jul = pd.DataFrame({'ID': [222, 333, 444, 555, 666, 777], 'Stage': [1, 2, 3, 4, 5, 6]})

aug['Prior'] = aug['ID'].map(jul.set_index('ID')['Stage'])

print(aug)

    ID  Prior
0  111    NaN
1  222    1.0
2  333    2.0
3  444    3.0
4  555    4.0

A more long-winded solution is possible via pd.Series.update and aligning indices: 通过pd.Series.update和对齐索引,可以解决更pd.Series.update

aug.set_index('ID', inplace=True)
aug['Prior'].update(jul.set_index('ID')['Stage'])
aug = aug.reset_index()

Although I hate posting this as an answer but if you are still interested in using lambda and apply you can you as below: 尽管我讨厌将此作为答案发布,但是如果您仍然对使用lambda感兴趣并申请,可以按照以下步骤进行:

df=pd.DataFrame({'Order_id_July':[222,333,444,555,666,777],'stage':[1,2,3,4,5,6]})
df2=pd.DataFrame({'Order_id_August':[111,222,333,444,555]})

Mapper function(similar to lookup) 映射器功能(类似于查找)

def myfunc(row):
    if set([row[0]]).intersection(set(df.Order_id_July)):       
        return int(df[df.Order_id_July==row[0]]['stage'])
    return np.nan

df2['prior_stage']=df2.apply(lambda x:myfunc(x),axis=1)

Output: 输出:

Order_id_August prior_stage
    111         NaN
    222         1.0
    333         2.0
    444         3.0
    555         4.0

If later you change your mind and want to explore better ways to achieving this task try below code 如果以后您改变主意并希望探索实现此任务的更好方法,请尝试以下代码

df2.merge(df,left_on='Order_id_August',right_on='Order_id_July',how='left').drop('Order_id_July',axis=1)

Order_id_August prior_stage
        111         NaN
        222         1.0
        333         2.0
        444         3.0
        555         4.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM