簡體   English   中英

在合並列中合並具有多個相同值的兩個數據框

[英]Merging two data frames having multiple same values in merging column

我有一個名為SKU_df的 df

merchant_SKU_filtered   uniqueCol
1313030 1313030_0
1409085 1409085_0
1338516 1338516_0
1409093 1409093_0
1409085 1409085_1
1415090 1415090_0
1490663 1490663_0
1490739 1490739_0
1490739 1490739_1
1491455 1491455_0
1490739 1490739_2
1492511 1492511_0
1492529 1492529_0
1571223 1571223_0
1492529 1492529_1
1571223 1571223_1
1571223 1571223_2
1572056 1572056_0
18718   18718_0
2000842 2000842_0
19749   19749_0
2007254 2007254_0
19749   19749_1
2024743 2024743_0
2107688 2107688_0
21505   21505_0
2124634 2124634_0
2166924 2166924_0
21419   21419_0
2422327 2422327_0
2508406 2508406_0
28046   28046_0
2690493 2690493_0
28046   28046_1
2690493 2690493_1
28046   28046_2
28639   28639_0
4064531 4064531_0
3002680 3002680_0
4262531 4262531_0
34363   34363_0
4369302 4369302_0
4369302 4369302_1
4587911 4587911_0
4500658 4500658_0
4591293 4591293_0
4569125 4569125_0
46810   46810_0

另一個 df 名為input_df

Merchant SKU,Quantity Per Box,NOB,Shipment Status,id_using_regex,prepped_by_initials
1313030 - Rit Dye Drk Grn 8oz 3pk,20,1,Complete,1313030 - Rit Dye Drk Grn 8oz 3pk,w
13296 - Minwax Wax Paste 16oz,45,1,Complete,13296 - Minwax Wax Paste 16oz,Vishal
1338516 - Qukrete Mortar Repair - 5pk,33,1,Complete,1338516 - Qukrete Mortar Repair - 5pk,w
1409085 - Howard Btchr Blck Cndtnr - 5pk,100,2,Complete,1409085 - Howard Btchr Blck Cndtnr - 5pk,w
1409093 - Howard Furniture Wax 3Pk,225,1,Complete,1409093 - Howard Furniture Wax 3Pk,w
1415090 - Werner Ladder Accessories,8,1,Complete,1415090 - Werner Ladder Accessories,w
1436872 - Whink Rust Remover 2Pk,1,1,Complete,1436872 - Whink Rust Remover 2Pk,P
1490663 - 3 pack,4,1,Complete,1490663 - 3 pack,w
1490739 - 6 pack,15,1,Complete,1490739 - 6 pack,A
1490739 - Loctite Blue 242 - 2 pack,23,1,Complete,1490739 - Loctite Blue 242 - 2 pack,B
1490739 - Loctite Blue 242 - 3 pack,99,1,Update AMZ Shipment,1490739 - Loctite Blue 242 - 3 pack,C
1491455 - Granite Gld Plsh Spry 3Pk,100,1,Update AMZ Shipment,1491455 - Granite Gld Plsh Spry 3Pk,w
1492511 - NP1 POLYSEAL WHITE,87,1,Complete,1492511 - NP1 POLYSEAL WHITE,w
1492529 - MasterSeal Sealant/Caulk 4Pk,30,2,Complete,1492529 - MasterSeal Sealant/Caulk 4Pk,w
1571223 - 2 pack,20,3,Complete,1571223 - 2 pack,w
1572056 - Method Dish Pump Refill,40,1,Complete,1572056 - Method Dish Pump Refill,w
1600667 - DAP All Prpse Adhsve 6Pk,22,1,Update AMZ Shipment,1600667 - DAP All Prpse Adhsve 6Pk,
18718 - FLOOD/PPG Additive 2Pk,22,1,Update AMZ Shipment,18718 - FLOOD/PPG Additive 2Pk,w
19749 - Titebond 5004 Prm Wd Glue - 2pk,11,1,Complete,19749 - Titebond 5004 Prm Wd Glue - 2pk,RH
19749 - Titebond II Wood Glue 2Pk,88,1,Complete,19749 - Titebond II Wood Glue 2Pk,RH
2000842 - Powerlock Tape Rule 2Pk,99,1,Complete,2000842 - Powerlock Tape Rule 2Pk,RH
2007254 - DEWALT Claw Hammer,77,1,Complete,2007254 - DEWALT Claw Hammer,RH
2024743 - Dico Nyalox Flap Brush 3Pk,22,1,Update AMZ Shipment,2024743 - Dico Nyalox Flap Brush 3Pk,w
2107688 - Stanley Ftmx Msrng Tpe,34,1,Update AMZ Shipment,2107688 - Stanley Ftmx Msrng Tpe,w
2124634 - Stanley Fat Max Knife,22,1,Update AMZ Shipment,2124634 - Stanley Fat Max Knife,w
21419 - Irwin 81107 No 7 Bit - 5pk,44,1,Update AMZ Shipment,21419 - Irwin 81107 No 7 Bit - 5pk,w
21505 - Irwin 60172 Drill Bit Stand,50,1,Update AMZ Shipment,21505 - Irwin 60172 Drill Bit Stand,RH
2166924 - Stanley Hook Knife,60,1,Update AMZ Shipment,2166924 - Stanley Hook Knife,RH
2422327 - Stanley Surform Round File,75,1,Complete,2422327 - Stanley Surform Round File,w
2508406 - Freud Pilot Bit - 5pk,76,1,Complete,2508406 - Freud Pilot Bit - 5pk,w
2690493 - STANLEY Hex Key Set,40,2,Complete,2690493 - STANLEY Hex Key Set,w
28046 - Arrow Fastener 276 - 12pk,90,1,Complete,28046 - Arrow Fastener 276 - 12pk,RH
28046 - Arrw Fstnr 276 Stpls - 10pk,55,1,Update AMZ Shipment,28046 - Arrw Fstnr 276 Stpls - 10pk,w
28046- Arrow 3/8 staples 2 pk,24,1,Complete,28046- Arrow 3/8 staples 2 pk,w
28639 - 2 pack,24,1,Complete,28639 - 2 pack,w
3002680 - Westinghouse Pull Chain Sckt,2,1,Complete,3002680 - Westinghouse Pull Chain Sckt,w
34363 - Carlon Switch & Outlet Box,24,1,Complete,34363 - Carlon Switch & Outlet Box,RH
4064531 - Korky Valve Rplcmnt,24,1,Update AMZ Shipment,4064531 - Korky Valve Rplcmnt,w
4262531 - Korky Flpper Rplaces Khler 3in,25,1,Update AMZ Shipment,4262531 - Korky Flpper Rplaces Khler 3in,w
4369302 - Korky Toilet Flapper 2Pk,34,1,Complete,4369302 - Korky Toilet Flapper 2Pk,w
4369302 - Korky Unvrsal 3in Flapper,23,1,Complete,4369302 - Korky Unvrsal 3in Flapper,w
4500658 - Enviro-Log Firestrtrs 2PK,12,1,Complete,4500658 - Enviro-Log Firestrtrs 2PK,RH
4569125,12,1,Complete,4569125,w
4587911 - Korky Fill Valve,12,1,Complete,4587911 - Korky Fill Valve,w
4591293 - Mansfield Flapper KIT,12,1,Complete,4591293 - Mansfield Flapper KIT,RH
46810 - Plyprpylne Hsng Wrnch,12,1,Update AMZ Shipment,46810 - Plyprpylne Hsng Wrnch,w

對於某些Mechant SKUs ,有不同的prepped_by_initial值。 因此,在加入這些數據幀之后,值變得一團糟。 我只想將prepped_by_intial列映射到merchant_SKU_filtered

這是我到目前為止嘗試過的代碼,

input_df['merchant_SKU_filtered'] = input_df['Merchant SKU'].str.split(' ').apply(lambda x: x[0])
input_df['merchant_SKU_filtered'] = input_df['merchant_SKU_filtered'].replace('-', '', regex=True)
input_df['merchant_SKU_filtered'] = input_df['merchant_SKU_filtered'].astype(str)
SKU_df['merchant_SKU_filtered'] = SKU_df['merchant_SKU_filtered'].astype(str)

suffix = input_df.groupby(input_df['merchant_SKU_filtered']).cumcount().astype(str)

keylist1 = list(SKU_df['merchant_SKU_filtered'])
dict_lookup1 = dict(zip(input['merchant_SKU_filtered'], input_df['prepped_by_initials']))
SKU_df['key1'] = [dict_lookup1[item] for item in keylist1]
SKU_df['key1'] = SKU_df['key1'].replace(np.nan, ' ', regex=True)

input_df['uniqueCol'] = input_df['merchant_SKU_filtered'] + '_' + suffix
key_list = list(SKU_df['uniqueCol'])
dict_lookup = dict(zip(SKU_df['uniqueCol'], input_df['prepped_by_initials']))
try:
    SKU_df['key2'] = SKU_df['uniqueCol'].map(dict_lookup)
except:
    print("Error")

SKU_df['prepped_by_initials'] = SKU_df['key2'].fillna(SKU_df['key1'])

這給了我一個 dataframe,盡管這些值是prepped_by_initial仍然沒有按順序排列。 例如, merchant_SKU_filtered1490739應具有值ABC 盡管我得到的是wAB ,但這些值未正確映射。

有什么建議么? 任何幫助將不勝感激!!

我有機會查看您的代碼。 導致錯誤值(例如1490739 )的問題是您創建dict_lookups的方式。 zip 只是將 2 列逐行放在一起。 您輸入的zip長度不同,因此映射錯誤。

您的SKU_dfinput_df長,而且merchant numbers也不同,您想如何處理SKU_df中不存在input_df中的唯一編號(因此它們沒有准備值)?

IIUC 你想要實現什么你可以做一個pd.merge而不是構建lookup_dict並在之后映射它們。

#extract Merchant Numbers in Input df as new column
input_df["merchant_SKU_filtered"] = (
    input_df["Merchant SKU"].str.split(" ").apply(lambda x: x[0])).replace(
    "-", "", regex=True).astype(str)

# add suffix to have unique Numbers (in case of duplicates)
suffix = input_df.groupby(input_df["merchant_SKU_filtered"]).cumcount().astype(str)
input_df["uniqueCol"] = input_df["merchant_SKU_filtered"] + "_" + suffix

SKU_df["merchant_SKU_filtered"] = SKU_df["merchant_SKU_filtered"].astype(str)

SKU_df.merge(input_df[["uniqueCol", "prepped_by_initials"]],
    on="uniqueCol",
    how="left")

print(SKU_df.head(20))

    merchant_SKU_filtered   uniqueCol   prepped_by_initials
0                 1313030   1313030_0   w
1                 1409085   1409085_0   w
2                 1338516   1338516_0   w
3                 1409093   1409093_0   w
4                 1409085   1409085_1   NaN
5                 1415090   1415090_0   w
6                 1490663   1490663_0   w
7                 1490739   1490739_0   A
8                 1490739   1490739_1   B
9                 1491455   1491455_0   w
10                1490739   1490739_2   C
11                1492511   1492511_0   w
12                1492529   1492529_0   w
13                1571223   1571223_0   w
14                1492529   1492529_1   NaN
15                1571223   1571223_1   NaN
16                1571223   1571223_2   NaN
17                1572056   1572056_0   w
18                18718     18718_0     w
19                2000842   2000842_0   RH
20                19749     19749_0     RH

正如您在示例編號1490739中看到的那樣,映射是正確的。 如果您檢查NaN行,您將不會在input_df uniqueCol

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM