简体   繁体   English

Pandas 像 vlookup 一样在第一次匹配时停止合并而不是复制

[英]Pandas merge stop at first match like vlookup instead of duplicating

I have two tables, PO data and commodity code data.我有两张表,PO数据和商品编码数据。 Some genius decided that some material group codes should be the same as they are differentiated at a lower level by GL accounts.一些天才决定一些物料组代码应该相同,因为它们在较低级别上由 GL 帐户区分。 Because of that, I can't merge on material groups, as I'll get duplicate rows.因此,我无法合并材料组,因为我会得到重复的行。

Assume the following:假设如下:

import pandas as pd

d1 = {'PO':[123456,654321,971358], 'matgrp': ["1001",'803A',"803B"]}
d2 = {'matgrp':["1001", "1001", "803A", "803B"], 'commodity':['foo - 10001', 'bar - 10002', 'spam - 100003','eggs - 10003']}

pos = pd.DataFrame(data=d1)
mat_grp = pd.DataFrame(data=d2)

merged = pd.merge(pos, mat_grp, how='left', on='matgrp')
merged.head()
      PO    matgrp  commodity
0   123456  1001    foo - 10001
1   123456  1001    bar - 10002
2   654321  803A    spam - 100003
3   971358  803B    eggs - 10003

As you can see, PO 123456 shows up twice, as there are multiple rows for material 1001 in the material groups table.如您所见,PO 123456 出现了两次,因为物料组表中物料 1001 有多个行。

The desired behavior is that merge only merges once, finds the first entry for the material group, adds it, and nothing else, like how vlookup works.期望的行为是 merge 只合并一次,找到材料组的第一个条目,添加它,没有别的,就像 vlookup 的工作方式一样。 The long commodity code might be incorrect in some cases (always showing the first one), that's an acceptable inaccuracy.长商品代码在某些情况下可能不正确(总是显示第一个),这是可以接受的错误。

ps.: while suggestions are welcome how to tackle this problem outside of the scope of this question (like merging on GL accounts, which is not feasible for other reasons) assume the following: The available data is a PO list from SAP ME81N and an Excel file with the list of material groups/commodity codes. ps.:虽然欢迎提出如何在这个问题的 scope 之外解决这个问题的建议(比如在 GL 帐户上合并,由于其他原因这是不可行的)假设如下:可用数据是来自 SAP ME81N 的 PO 列表和一个Excel 文件,包含材料组/商品代码列表。

pandas' merge behaves (mostly) like a SQL merge and will provide all combinations of matching keys. pandas 的merge行为(大部分)类似于 SQL 合并,并将提供匹配键的所有组合。 If you only want the first item, simply remove it from the data you feed to merge.如果您只想要第一项,只需将其从您提供的数据中删除即可合并。

Use drop_duplicates on mat_grp :drop_duplicates上使用mat_grp

merged = pd.merge(pos, mat_grp.drop_duplicates('matgrp'), how='left', on='matgrp')

output: output:

       PO matgrp      commodity
0  123456   1001    foo - 10001
1  654321   803A  spam - 100003
2  971358   803B   eggs - 10003

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM