[英]Pandas - Add values from series to dataframe column based on index of series matching some value in dataframe
Data数据
pb = {"mark_up_id":{"0":"123","1":"456","2":"789","3":"111","4":"222"},"mark_up":{"0":1.2987,"1":1.5625,"2":1.3698,"3":1.3333,"4":1.4589}}
data = {"id":{"0":"K69","1":"K70","2":"K71","3":"K72","4":"K73","5":"K74","6":"K75","7":"K79","8":"K86","9":"K100"},"cost":{"0":29.74,"1":9.42,"2":9.42,"3":9.42,"4":9.48,"5":9.48,"6":24.36,"7":5.16,"8":9.8,"9":3.28},"mark_up_id":{"0":"123","1":"456","2":"789","3":"111","4":"222","5":"333","6":"444","7":"555","8":"666","9":"777"}}
pb = pd.DataFrame(data=pb).set_index('mark_up_id')
df = pd.DataFrame(data=data)
Expected Output预期产出
test = df.join(pb, on='mark_up_id', how='left')
test['cost'].update(test['cost'] + test['mark_up'])
test.drop('mark_up',axis=1,inplace=True)
Or..或者..
df['cost'].update(df['mark_up_id'].map(pb['mark_up']) + df['cost'])
Question题
Is there a function that does the above, or is this the best way to go about this type of operation?是否有执行上述操作的功能,或者这是进行此类操作的最佳方法?
I would use the second solution you propose or better this:我会使用您提出的第二种解决方案或更好的解决方案:
df['cost']=(df['mark_up_id'].map(pb['mark_up']) + df['cost']).fillna(df['cost'])
I think using update can be uncomfortable because it doesn't return anything.我认为使用 update 可能不舒服,因为它不会返回任何东西。
Let's say Series.fillna
is more flexible.假设Series.fillna
更灵活。
We can also use DataFrame.assign
in order to continue working on the DataFrame that the assignment returns.我们还可以使用DataFrame.assign
继续处理分配返回的 DataFrame。
df.assign( Cost=(df['mark_up_id'].map(pb['mark_up']) + df['cost']).fillna(df['cost']) )
Time comparision with join
method与join
方法的时间比较
%%timeit
df['cost']=(df['mark_up_id'].map(pb['mark_up']) + df['cost']).fillna(df['cost'])
#945 µs ± 46 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
test = df.join(pb, on='mark_up_id', how='left')
test['cost'].update(test['cost'] + test['mark_up'])
test.drop('mark_up',axis=1,inplace=True)
#3.59 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
slow..减缓..
%%timeit
df['cost'].update(df['mark_up_id'].map(pb['mark_up']) + df['cost'])
#985 µs ± 32.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Finally,I recommend you see: Underastanding inplace
and When I should use apply
最后,我建议您查看: Underastanding inplace
和When I should use apply
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.