[英]Pandas merge stop at first match like vlookup instead of duplicating
I have two tables, PO data and commodity code data.我有两张表,PO数据和商品编码数据。 Some genius decided that some material group codes should be the same as they are differentiated at a lower level by GL accounts.一些天才决定一些物料组代码应该相同,因为它们在较低级别上由 GL 帐户区分。 Because of that, I can't merge on material groups, as I'll get duplicate rows.因此,我无法合并材料组,因为我会得到重复的行。
Assume the following:假设如下:
import pandas as pd
d1 = {'PO':[123456,654321,971358], 'matgrp': ["1001",'803A',"803B"]}
d2 = {'matgrp':["1001", "1001", "803A", "803B"], 'commodity':['foo - 10001', 'bar - 10002', 'spam - 100003','eggs - 10003']}
pos = pd.DataFrame(data=d1)
mat_grp = pd.DataFrame(data=d2)
merged = pd.merge(pos, mat_grp, how='left', on='matgrp')
merged.head()
PO matgrp commodity
0 123456 1001 foo - 10001
1 123456 1001 bar - 10002
2 654321 803A spam - 100003
3 971358 803B eggs - 10003
As you can see, PO 123456 shows up twice, as there are multiple rows for material 1001 in the material groups table.如您所见,PO 123456 出现了两次,因为物料组表中物料 1001 有多个行。
The desired behavior is that merge only merges once, finds the first entry for the material group, adds it, and nothing else, like how vlookup works.期望的行为是 merge 只合并一次,找到材料组的第一个条目,添加它,没有别的,就像 vlookup 的工作方式一样。 The long commodity code might be incorrect in some cases (always showing the first one), that's an acceptable inaccuracy.长商品代码在某些情况下可能不正确(总是显示第一个),这是可以接受的错误。
ps.: while suggestions are welcome how to tackle this problem outside of the scope of this question (like merging on GL accounts, which is not feasible for other reasons) assume the following: The available data is a PO list from SAP ME81N and an Excel file with the list of material groups/commodity codes. ps.:虽然欢迎提出如何在这个问题的 scope 之外解决这个问题的建议(比如在 GL 帐户上合并,由于其他原因这是不可行的)假设如下:可用数据是来自 SAP ME81N 的 PO 列表和一个Excel 文件,包含材料组/商品代码列表。
pandas' merge
behaves (mostly) like a SQL merge and will provide all combinations of matching keys. pandas 的merge
行为(大部分)类似于 SQL 合并,并将提供匹配键的所有组合。 If you only want the first item, simply remove it from the data you feed to merge.如果您只想要第一项,只需将其从您提供的数据中删除即可合并。
Use drop_duplicates
on mat_grp
:在drop_duplicates
上使用mat_grp
:
merged = pd.merge(pos, mat_grp.drop_duplicates('matgrp'), how='left', on='matgrp')
output: output:
PO matgrp commodity
0 123456 1001 foo - 10001
1 654321 803A spam - 100003
2 971358 803B eggs - 10003
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.