[英]Merging two data frames having multiple same values in merging column
I have a df named SKU_df
我有一个名为
SKU_df
的 df
merchant_SKU_filtered uniqueCol
1313030 1313030_0
1409085 1409085_0
1338516 1338516_0
1409093 1409093_0
1409085 1409085_1
1415090 1415090_0
1490663 1490663_0
1490739 1490739_0
1490739 1490739_1
1491455 1491455_0
1490739 1490739_2
1492511 1492511_0
1492529 1492529_0
1571223 1571223_0
1492529 1492529_1
1571223 1571223_1
1571223 1571223_2
1572056 1572056_0
18718 18718_0
2000842 2000842_0
19749 19749_0
2007254 2007254_0
19749 19749_1
2024743 2024743_0
2107688 2107688_0
21505 21505_0
2124634 2124634_0
2166924 2166924_0
21419 21419_0
2422327 2422327_0
2508406 2508406_0
28046 28046_0
2690493 2690493_0
28046 28046_1
2690493 2690493_1
28046 28046_2
28639 28639_0
4064531 4064531_0
3002680 3002680_0
4262531 4262531_0
34363 34363_0
4369302 4369302_0
4369302 4369302_1
4587911 4587911_0
4500658 4500658_0
4591293 4591293_0
4569125 4569125_0
46810 46810_0
And another df named input_df
.另一个 df 名为
input_df
。
Merchant SKU,Quantity Per Box,NOB,Shipment Status,id_using_regex,prepped_by_initials
1313030 - Rit Dye Drk Grn 8oz 3pk,20,1,Complete,1313030 - Rit Dye Drk Grn 8oz 3pk,w
13296 - Minwax Wax Paste 16oz,45,1,Complete,13296 - Minwax Wax Paste 16oz,Vishal
1338516 - Qukrete Mortar Repair - 5pk,33,1,Complete,1338516 - Qukrete Mortar Repair - 5pk,w
1409085 - Howard Btchr Blck Cndtnr - 5pk,100,2,Complete,1409085 - Howard Btchr Blck Cndtnr - 5pk,w
1409093 - Howard Furniture Wax 3Pk,225,1,Complete,1409093 - Howard Furniture Wax 3Pk,w
1415090 - Werner Ladder Accessories,8,1,Complete,1415090 - Werner Ladder Accessories,w
1436872 - Whink Rust Remover 2Pk,1,1,Complete,1436872 - Whink Rust Remover 2Pk,P
1490663 - 3 pack,4,1,Complete,1490663 - 3 pack,w
1490739 - 6 pack,15,1,Complete,1490739 - 6 pack,A
1490739 - Loctite Blue 242 - 2 pack,23,1,Complete,1490739 - Loctite Blue 242 - 2 pack,B
1490739 - Loctite Blue 242 - 3 pack,99,1,Update AMZ Shipment,1490739 - Loctite Blue 242 - 3 pack,C
1491455 - Granite Gld Plsh Spry 3Pk,100,1,Update AMZ Shipment,1491455 - Granite Gld Plsh Spry 3Pk,w
1492511 - NP1 POLYSEAL WHITE,87,1,Complete,1492511 - NP1 POLYSEAL WHITE,w
1492529 - MasterSeal Sealant/Caulk 4Pk,30,2,Complete,1492529 - MasterSeal Sealant/Caulk 4Pk,w
1571223 - 2 pack,20,3,Complete,1571223 - 2 pack,w
1572056 - Method Dish Pump Refill,40,1,Complete,1572056 - Method Dish Pump Refill,w
1600667 - DAP All Prpse Adhsve 6Pk,22,1,Update AMZ Shipment,1600667 - DAP All Prpse Adhsve 6Pk,
18718 - FLOOD/PPG Additive 2Pk,22,1,Update AMZ Shipment,18718 - FLOOD/PPG Additive 2Pk,w
19749 - Titebond 5004 Prm Wd Glue - 2pk,11,1,Complete,19749 - Titebond 5004 Prm Wd Glue - 2pk,RH
19749 - Titebond II Wood Glue 2Pk,88,1,Complete,19749 - Titebond II Wood Glue 2Pk,RH
2000842 - Powerlock Tape Rule 2Pk,99,1,Complete,2000842 - Powerlock Tape Rule 2Pk,RH
2007254 - DEWALT Claw Hammer,77,1,Complete,2007254 - DEWALT Claw Hammer,RH
2024743 - Dico Nyalox Flap Brush 3Pk,22,1,Update AMZ Shipment,2024743 - Dico Nyalox Flap Brush 3Pk,w
2107688 - Stanley Ftmx Msrng Tpe,34,1,Update AMZ Shipment,2107688 - Stanley Ftmx Msrng Tpe,w
2124634 - Stanley Fat Max Knife,22,1,Update AMZ Shipment,2124634 - Stanley Fat Max Knife,w
21419 - Irwin 81107 No 7 Bit - 5pk,44,1,Update AMZ Shipment,21419 - Irwin 81107 No 7 Bit - 5pk,w
21505 - Irwin 60172 Drill Bit Stand,50,1,Update AMZ Shipment,21505 - Irwin 60172 Drill Bit Stand,RH
2166924 - Stanley Hook Knife,60,1,Update AMZ Shipment,2166924 - Stanley Hook Knife,RH
2422327 - Stanley Surform Round File,75,1,Complete,2422327 - Stanley Surform Round File,w
2508406 - Freud Pilot Bit - 5pk,76,1,Complete,2508406 - Freud Pilot Bit - 5pk,w
2690493 - STANLEY Hex Key Set,40,2,Complete,2690493 - STANLEY Hex Key Set,w
28046 - Arrow Fastener 276 - 12pk,90,1,Complete,28046 - Arrow Fastener 276 - 12pk,RH
28046 - Arrw Fstnr 276 Stpls - 10pk,55,1,Update AMZ Shipment,28046 - Arrw Fstnr 276 Stpls - 10pk,w
28046- Arrow 3/8 staples 2 pk,24,1,Complete,28046- Arrow 3/8 staples 2 pk,w
28639 - 2 pack,24,1,Complete,28639 - 2 pack,w
3002680 - Westinghouse Pull Chain Sckt,2,1,Complete,3002680 - Westinghouse Pull Chain Sckt,w
34363 - Carlon Switch & Outlet Box,24,1,Complete,34363 - Carlon Switch & Outlet Box,RH
4064531 - Korky Valve Rplcmnt,24,1,Update AMZ Shipment,4064531 - Korky Valve Rplcmnt,w
4262531 - Korky Flpper Rplaces Khler 3in,25,1,Update AMZ Shipment,4262531 - Korky Flpper Rplaces Khler 3in,w
4369302 - Korky Toilet Flapper 2Pk,34,1,Complete,4369302 - Korky Toilet Flapper 2Pk,w
4369302 - Korky Unvrsal 3in Flapper,23,1,Complete,4369302 - Korky Unvrsal 3in Flapper,w
4500658 - Enviro-Log Firestrtrs 2PK,12,1,Complete,4500658 - Enviro-Log Firestrtrs 2PK,RH
4569125,12,1,Complete,4569125,w
4587911 - Korky Fill Valve,12,1,Complete,4587911 - Korky Fill Valve,w
4591293 - Mansfield Flapper KIT,12,1,Complete,4591293 - Mansfield Flapper KIT,RH
46810 - Plyprpylne Hsng Wrnch,12,1,Update AMZ Shipment,46810 - Plyprpylne Hsng Wrnch,w
For some Mechant SKUs
there are different values of prepped_by_initial
.对于某些
Mechant SKUs
,有不同的prepped_by_initial
值。 So, after joining these dataframes, the values are getting messed up.因此,在加入这些数据帧之后,值变得一团糟。 I just want the
prepped_by_intial
column to be mapped on merchant_SKU_filtered
.我只想将
prepped_by_intial
列映射到merchant_SKU_filtered
。
This is the code I've tried so far,这是我到目前为止尝试过的代码,
input_df['merchant_SKU_filtered'] = input_df['Merchant SKU'].str.split(' ').apply(lambda x: x[0])
input_df['merchant_SKU_filtered'] = input_df['merchant_SKU_filtered'].replace('-', '', regex=True)
input_df['merchant_SKU_filtered'] = input_df['merchant_SKU_filtered'].astype(str)
SKU_df['merchant_SKU_filtered'] = SKU_df['merchant_SKU_filtered'].astype(str)
suffix = input_df.groupby(input_df['merchant_SKU_filtered']).cumcount().astype(str)
keylist1 = list(SKU_df['merchant_SKU_filtered'])
dict_lookup1 = dict(zip(input['merchant_SKU_filtered'], input_df['prepped_by_initials']))
SKU_df['key1'] = [dict_lookup1[item] for item in keylist1]
SKU_df['key1'] = SKU_df['key1'].replace(np.nan, ' ', regex=True)
input_df['uniqueCol'] = input_df['merchant_SKU_filtered'] + '_' + suffix
key_list = list(SKU_df['uniqueCol'])
dict_lookup = dict(zip(SKU_df['uniqueCol'], input_df['prepped_by_initials']))
try:
SKU_df['key2'] = SKU_df['uniqueCol'].map(dict_lookup)
except:
print("Error")
SKU_df['prepped_by_initials'] = SKU_df['key2'].fillna(SKU_df['key1'])
WHich gives me a dataframe, although the values are prepped_by_initial
are still not in order.这给了我一个 dataframe,尽管这些值是
prepped_by_initial
仍然没有按顺序排列。 For eg merchant_SKU_filtered
value 1490739
should have values A
, B
and C
.例如,
merchant_SKU_filtered
值1490739
应具有值A
、 B
和C
。 Albeit I'm getting w
, A
, and B
that is values are not getting mapped correctly.尽管我得到的是
w
、 A
和B
,但这些值未正确映射。
Any suggestions?有什么建议么? Any help will be appreciated!!
任何帮助将不胜感激!!
I had a chance to look into your code.我有机会查看您的代码。 The problem which causes wrong values eg
1490739
is the way you create your dict_lookups
.导致错误值(例如
1490739
)的问题是您创建dict_lookups
的方式。 zip just put the 2 columns together row by row. zip 只是将 2 列逐行放在一起。 Your input of the
zip
has different length, so the mapping is wrong.您输入的
zip
长度不同,因此映射错误。
Your SKU_df
is longer than the input_df
and also different merchant numbers
, what do you want to do with unique Numbers in SKU_df
which aren't present in input_df
(so they have no prepped value)?您的
SKU_df
比input_df
长,而且merchant numbers
也不同,您想如何处理SKU_df
中不存在input_df
中的唯一编号(因此它们没有准备值)?
IIUC what you want to achieve you can do a pd.merge
instead of building the lookup_dict
and mapping them after. IIUC 你想要实现什么你可以做一个
pd.merge
而不是构建lookup_dict
并在之后映射它们。
#extract Merchant Numbers in Input df as new column
input_df["merchant_SKU_filtered"] = (
input_df["Merchant SKU"].str.split(" ").apply(lambda x: x[0])).replace(
"-", "", regex=True).astype(str)
# add suffix to have unique Numbers (in case of duplicates)
suffix = input_df.groupby(input_df["merchant_SKU_filtered"]).cumcount().astype(str)
input_df["uniqueCol"] = input_df["merchant_SKU_filtered"] + "_" + suffix
SKU_df["merchant_SKU_filtered"] = SKU_df["merchant_SKU_filtered"].astype(str)
SKU_df.merge(input_df[["uniqueCol", "prepped_by_initials"]],
on="uniqueCol",
how="left")
print(SKU_df.head(20))
merchant_SKU_filtered uniqueCol prepped_by_initials
0 1313030 1313030_0 w
1 1409085 1409085_0 w
2 1338516 1338516_0 w
3 1409093 1409093_0 w
4 1409085 1409085_1 NaN
5 1415090 1415090_0 w
6 1490663 1490663_0 w
7 1490739 1490739_0 A
8 1490739 1490739_1 B
9 1491455 1491455_0 w
10 1490739 1490739_2 C
11 1492511 1492511_0 w
12 1492529 1492529_0 w
13 1571223 1571223_0 w
14 1492529 1492529_1 NaN
15 1571223 1571223_1 NaN
16 1571223 1571223_2 NaN
17 1572056 1572056_0 w
18 18718 18718_0 w
19 2000842 2000842_0 RH
20 19749 19749_0 RH
As you can see for your example number 1490739
the mapping is right.正如您在示例编号
1490739
中看到的那样,映射是正确的。 If you check the NaN
rows you won't find these uniqueCol
values in the input_df
如果您检查
NaN
行,您将不会在input_df
uniqueCol
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.