[英]Merging two data frames having multiple same values in merging column
我有一個名為SKU_df
的 df
merchant_SKU_filtered uniqueCol
1313030 1313030_0
1409085 1409085_0
1338516 1338516_0
1409093 1409093_0
1409085 1409085_1
1415090 1415090_0
1490663 1490663_0
1490739 1490739_0
1490739 1490739_1
1491455 1491455_0
1490739 1490739_2
1492511 1492511_0
1492529 1492529_0
1571223 1571223_0
1492529 1492529_1
1571223 1571223_1
1571223 1571223_2
1572056 1572056_0
18718 18718_0
2000842 2000842_0
19749 19749_0
2007254 2007254_0
19749 19749_1
2024743 2024743_0
2107688 2107688_0
21505 21505_0
2124634 2124634_0
2166924 2166924_0
21419 21419_0
2422327 2422327_0
2508406 2508406_0
28046 28046_0
2690493 2690493_0
28046 28046_1
2690493 2690493_1
28046 28046_2
28639 28639_0
4064531 4064531_0
3002680 3002680_0
4262531 4262531_0
34363 34363_0
4369302 4369302_0
4369302 4369302_1
4587911 4587911_0
4500658 4500658_0
4591293 4591293_0
4569125 4569125_0
46810 46810_0
另一個 df 名為input_df
。
Merchant SKU,Quantity Per Box,NOB,Shipment Status,id_using_regex,prepped_by_initials
1313030 - Rit Dye Drk Grn 8oz 3pk,20,1,Complete,1313030 - Rit Dye Drk Grn 8oz 3pk,w
13296 - Minwax Wax Paste 16oz,45,1,Complete,13296 - Minwax Wax Paste 16oz,Vishal
1338516 - Qukrete Mortar Repair - 5pk,33,1,Complete,1338516 - Qukrete Mortar Repair - 5pk,w
1409085 - Howard Btchr Blck Cndtnr - 5pk,100,2,Complete,1409085 - Howard Btchr Blck Cndtnr - 5pk,w
1409093 - Howard Furniture Wax 3Pk,225,1,Complete,1409093 - Howard Furniture Wax 3Pk,w
1415090 - Werner Ladder Accessories,8,1,Complete,1415090 - Werner Ladder Accessories,w
1436872 - Whink Rust Remover 2Pk,1,1,Complete,1436872 - Whink Rust Remover 2Pk,P
1490663 - 3 pack,4,1,Complete,1490663 - 3 pack,w
1490739 - 6 pack,15,1,Complete,1490739 - 6 pack,A
1490739 - Loctite Blue 242 - 2 pack,23,1,Complete,1490739 - Loctite Blue 242 - 2 pack,B
1490739 - Loctite Blue 242 - 3 pack,99,1,Update AMZ Shipment,1490739 - Loctite Blue 242 - 3 pack,C
1491455 - Granite Gld Plsh Spry 3Pk,100,1,Update AMZ Shipment,1491455 - Granite Gld Plsh Spry 3Pk,w
1492511 - NP1 POLYSEAL WHITE,87,1,Complete,1492511 - NP1 POLYSEAL WHITE,w
1492529 - MasterSeal Sealant/Caulk 4Pk,30,2,Complete,1492529 - MasterSeal Sealant/Caulk 4Pk,w
1571223 - 2 pack,20,3,Complete,1571223 - 2 pack,w
1572056 - Method Dish Pump Refill,40,1,Complete,1572056 - Method Dish Pump Refill,w
1600667 - DAP All Prpse Adhsve 6Pk,22,1,Update AMZ Shipment,1600667 - DAP All Prpse Adhsve 6Pk,
18718 - FLOOD/PPG Additive 2Pk,22,1,Update AMZ Shipment,18718 - FLOOD/PPG Additive 2Pk,w
19749 - Titebond 5004 Prm Wd Glue - 2pk,11,1,Complete,19749 - Titebond 5004 Prm Wd Glue - 2pk,RH
19749 - Titebond II Wood Glue 2Pk,88,1,Complete,19749 - Titebond II Wood Glue 2Pk,RH
2000842 - Powerlock Tape Rule 2Pk,99,1,Complete,2000842 - Powerlock Tape Rule 2Pk,RH
2007254 - DEWALT Claw Hammer,77,1,Complete,2007254 - DEWALT Claw Hammer,RH
2024743 - Dico Nyalox Flap Brush 3Pk,22,1,Update AMZ Shipment,2024743 - Dico Nyalox Flap Brush 3Pk,w
2107688 - Stanley Ftmx Msrng Tpe,34,1,Update AMZ Shipment,2107688 - Stanley Ftmx Msrng Tpe,w
2124634 - Stanley Fat Max Knife,22,1,Update AMZ Shipment,2124634 - Stanley Fat Max Knife,w
21419 - Irwin 81107 No 7 Bit - 5pk,44,1,Update AMZ Shipment,21419 - Irwin 81107 No 7 Bit - 5pk,w
21505 - Irwin 60172 Drill Bit Stand,50,1,Update AMZ Shipment,21505 - Irwin 60172 Drill Bit Stand,RH
2166924 - Stanley Hook Knife,60,1,Update AMZ Shipment,2166924 - Stanley Hook Knife,RH
2422327 - Stanley Surform Round File,75,1,Complete,2422327 - Stanley Surform Round File,w
2508406 - Freud Pilot Bit - 5pk,76,1,Complete,2508406 - Freud Pilot Bit - 5pk,w
2690493 - STANLEY Hex Key Set,40,2,Complete,2690493 - STANLEY Hex Key Set,w
28046 - Arrow Fastener 276 - 12pk,90,1,Complete,28046 - Arrow Fastener 276 - 12pk,RH
28046 - Arrw Fstnr 276 Stpls - 10pk,55,1,Update AMZ Shipment,28046 - Arrw Fstnr 276 Stpls - 10pk,w
28046- Arrow 3/8 staples 2 pk,24,1,Complete,28046- Arrow 3/8 staples 2 pk,w
28639 - 2 pack,24,1,Complete,28639 - 2 pack,w
3002680 - Westinghouse Pull Chain Sckt,2,1,Complete,3002680 - Westinghouse Pull Chain Sckt,w
34363 - Carlon Switch & Outlet Box,24,1,Complete,34363 - Carlon Switch & Outlet Box,RH
4064531 - Korky Valve Rplcmnt,24,1,Update AMZ Shipment,4064531 - Korky Valve Rplcmnt,w
4262531 - Korky Flpper Rplaces Khler 3in,25,1,Update AMZ Shipment,4262531 - Korky Flpper Rplaces Khler 3in,w
4369302 - Korky Toilet Flapper 2Pk,34,1,Complete,4369302 - Korky Toilet Flapper 2Pk,w
4369302 - Korky Unvrsal 3in Flapper,23,1,Complete,4369302 - Korky Unvrsal 3in Flapper,w
4500658 - Enviro-Log Firestrtrs 2PK,12,1,Complete,4500658 - Enviro-Log Firestrtrs 2PK,RH
4569125,12,1,Complete,4569125,w
4587911 - Korky Fill Valve,12,1,Complete,4587911 - Korky Fill Valve,w
4591293 - Mansfield Flapper KIT,12,1,Complete,4591293 - Mansfield Flapper KIT,RH
46810 - Plyprpylne Hsng Wrnch,12,1,Update AMZ Shipment,46810 - Plyprpylne Hsng Wrnch,w
對於某些Mechant SKUs
,有不同的prepped_by_initial
值。 因此,在加入這些數據幀之后,值變得一團糟。 我只想將prepped_by_intial
列映射到merchant_SKU_filtered
。
這是我到目前為止嘗試過的代碼,
input_df['merchant_SKU_filtered'] = input_df['Merchant SKU'].str.split(' ').apply(lambda x: x[0])
input_df['merchant_SKU_filtered'] = input_df['merchant_SKU_filtered'].replace('-', '', regex=True)
input_df['merchant_SKU_filtered'] = input_df['merchant_SKU_filtered'].astype(str)
SKU_df['merchant_SKU_filtered'] = SKU_df['merchant_SKU_filtered'].astype(str)
suffix = input_df.groupby(input_df['merchant_SKU_filtered']).cumcount().astype(str)
keylist1 = list(SKU_df['merchant_SKU_filtered'])
dict_lookup1 = dict(zip(input['merchant_SKU_filtered'], input_df['prepped_by_initials']))
SKU_df['key1'] = [dict_lookup1[item] for item in keylist1]
SKU_df['key1'] = SKU_df['key1'].replace(np.nan, ' ', regex=True)
input_df['uniqueCol'] = input_df['merchant_SKU_filtered'] + '_' + suffix
key_list = list(SKU_df['uniqueCol'])
dict_lookup = dict(zip(SKU_df['uniqueCol'], input_df['prepped_by_initials']))
try:
SKU_df['key2'] = SKU_df['uniqueCol'].map(dict_lookup)
except:
print("Error")
SKU_df['prepped_by_initials'] = SKU_df['key2'].fillna(SKU_df['key1'])
這給了我一個 dataframe,盡管這些值是prepped_by_initial
仍然沒有按順序排列。 例如, merchant_SKU_filtered
值1490739
應具有值A
、 B
和C
。 盡管我得到的是w
、 A
和B
,但這些值未正確映射。
有什么建議么? 任何幫助將不勝感激!!
我有機會查看您的代碼。 導致錯誤值(例如1490739
)的問題是您創建dict_lookups
的方式。 zip 只是將 2 列逐行放在一起。 您輸入的zip
長度不同,因此映射錯誤。
您的SKU_df
比input_df
長,而且merchant numbers
也不同,您想如何處理SKU_df
中不存在input_df
中的唯一編號(因此它們沒有准備值)?
IIUC 你想要實現什么你可以做一個pd.merge
而不是構建lookup_dict
並在之后映射它們。
#extract Merchant Numbers in Input df as new column
input_df["merchant_SKU_filtered"] = (
input_df["Merchant SKU"].str.split(" ").apply(lambda x: x[0])).replace(
"-", "", regex=True).astype(str)
# add suffix to have unique Numbers (in case of duplicates)
suffix = input_df.groupby(input_df["merchant_SKU_filtered"]).cumcount().astype(str)
input_df["uniqueCol"] = input_df["merchant_SKU_filtered"] + "_" + suffix
SKU_df["merchant_SKU_filtered"] = SKU_df["merchant_SKU_filtered"].astype(str)
SKU_df.merge(input_df[["uniqueCol", "prepped_by_initials"]],
on="uniqueCol",
how="left")
print(SKU_df.head(20))
merchant_SKU_filtered uniqueCol prepped_by_initials
0 1313030 1313030_0 w
1 1409085 1409085_0 w
2 1338516 1338516_0 w
3 1409093 1409093_0 w
4 1409085 1409085_1 NaN
5 1415090 1415090_0 w
6 1490663 1490663_0 w
7 1490739 1490739_0 A
8 1490739 1490739_1 B
9 1491455 1491455_0 w
10 1490739 1490739_2 C
11 1492511 1492511_0 w
12 1492529 1492529_0 w
13 1571223 1571223_0 w
14 1492529 1492529_1 NaN
15 1571223 1571223_1 NaN
16 1571223 1571223_2 NaN
17 1572056 1572056_0 w
18 18718 18718_0 w
19 2000842 2000842_0 RH
20 19749 19749_0 RH
正如您在示例編號1490739
中看到的那樣,映射是正確的。 如果您檢查NaN
行,您將不會在input_df
uniqueCol
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.