Pandas - 根据与数据框中某个值匹配的系列索引，将系列中的值添加到数据框列

Question

Data数据

pb = {"mark_up_id":{"0":"123","1":"456","2":"789","3":"111","4":"222"},"mark_up":{"0":1.2987,"1":1.5625,"2":1.3698,"3":1.3333,"4":1.4589}}
data = {"id":{"0":"K69","1":"K70","2":"K71","3":"K72","4":"K73","5":"K74","6":"K75","7":"K79","8":"K86","9":"K100"},"cost":{"0":29.74,"1":9.42,"2":9.42,"3":9.42,"4":9.48,"5":9.48,"6":24.36,"7":5.16,"8":9.8,"9":3.28},"mark_up_id":{"0":"123","1":"456","2":"789","3":"111","4":"222","5":"333","6":"444","7":"555","8":"666","9":"777"}}
pb = pd.DataFrame(data=pb).set_index('mark_up_id')
df = pd.DataFrame(data=data)

Expected Output预期产出

test = df.join(pb, on='mark_up_id', how='left')
test['cost'].update(test['cost'] + test['mark_up'])
test.drop('mark_up',axis=1,inplace=True)

Or..或者..

df['cost'].update(df['mark_up_id'].map(pb['mark_up']) + df['cost'])

Question题

Is there a function that does the above, or is this the best way to go about this type of operation?是否有执行上述操作的功能，或者这是进行此类操作的最佳方法？

Answer 1

I would use the second solution you propose or better this:我会使用您提出的第二种解决方案或更好的解决方案：

df['cost']=(df['mark_up_id'].map(pb['mark_up']) + df['cost']).fillna(df['cost'])

I think using update can be uncomfortable because it doesn't return anything.我认为使用 update 可能不舒服，因为它不会返回任何东西。

Let's say Series.fillna is more flexible.假设Series.fillna更灵活。

We can also use DataFrame.assign in order to continue working on the DataFrame that the assignment returns.我们还可以使用DataFrame.assign继续处理分配返回的 DataFrame。

df.assign( Cost=(df['mark_up_id'].map(pb['mark_up']) + df['cost']).fillna(df['cost']) )

Time comparision with join method与join方法的时间比较

%%timeit
df['cost']=(df['mark_up_id'].map(pb['mark_up']) + df['cost']).fillna(df['cost'])
#945 µs ± 46 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
test = df.join(pb, on='mark_up_id', how='left')
test['cost'].update(test['cost'] + test['mark_up'])
test.drop('mark_up',axis=1,inplace=True)
#3.59 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

slow..减缓..

%%timeit
df['cost'].update(df['mark_up_id'].map(pb['mark_up']) + df['cost'])
#985 µs ± 32.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Finally,I recommend you see: Underastanding inplace and When I should use apply最后，我建议您查看： Underastanding inplace和When I should use apply

Pandas - 根据与数据框中某个值匹配的系列索引，将系列中的值添加到数据框列

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-12-26 19:28:39

Pandas - 根据与数据框中某个值匹配的系列索引，将系列中的值添加到数据框列

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-12-26 19:28:39

解决方案1
2 已采纳 2019-12-26 19:28:39