[英]How to compare columns of two dataframes to add a mapping
我有兩個數據幀,如下所示,並且我正在嘗試根據數據幀2的ITEM / CODE將值設置為數據幀1中的分類。 如果DESC有與ITEM / CODE匹配的單詞,那么我必須從dataframe2獲取TYPE。
為此,我拆分了DESC字符串,並嘗試將列表條目與dataframe2的ITEM / CODE比較。 有關如何執行此操作的任何想法?
dataframe1
PN DESC CLASSIFICATION
C23890 Resistor 2.21K elec
C23891 Powerswitch
C23892 Resistor 7.5K
C23893 Resistor .1K
C23894 FET elec
C23895 ELE SD Card adapter
C23896 Crystal 16Mhz
C23897 Capacitor 100uF
C23898 ELECTRONICS Resistor 10K
C23899 M3x5 Socket Cap Bolt MECH
C23900 M3x6 Socket Cap Bolt Mech
C23901 Mehcanical Assemble Kapton Tape 120mm
C23902 MK7 Filament Drive Block Front
C23903 Pulley 5mm shaft
dataframe2
ITEM/CODE TYPE
ELE ELECTRONIC
ELECTRONICS ELECTRONIC
Capacitor ELECTRONIC
Resistor ELECTRONIC
Washer MECHANICAL
MECH MECHANICAL
到目前為止,這就是我編寫的代碼。
import pandas as pd
fn = 'D:\PartsExport.xlsx'
dfInput = pd.read_excel(fn, 'Sheet1')
fn_type = 'D:\TypeMaster.xlsx'
dfType = pd.read_excel(fn_type, 'Sheet1')
dfInput['DESC_SPLIT'] = dfInput["DESC"].str.split(" ", n=-1, expand = False)
Result
PN DESC CLASSIFICATION
C23890 Resistor 2.21K elec ELECTRONIC
C23891 Powerswitch ELECTRONIC
C23892 Resistor 7.5K ELECTRONIC
C23893 Resistor .1K ELECTRONIC
C23899 M3x5 Socket Cap Bolt MECH MECHANICAL
使用Series.str.contains
與由創建系列循環dataframe2
, flags=re.I
參數是不匹配的情況下:
import re
for k, v in dataframe2.set_index('ITEM/CODE')['TYPE'].items():
#if necessary word boundaries
pat = r"\b{}\b".format(k)
#if not
#pat = k
dataframe1.loc[dataframe1['DESC'].str.contains(pat, flags=re.I), 'CLASSIFICATION'] = v
print (dataframe1)
PN DESC CLASSIFICATION
0 C23890 Resistor 2.21K elec ELECTRONIC
1 C23891 Powerswitch NaN
2 C23892 Resistor 7.5K ELECTRONIC
3 C23893 Resistor .1K ELECTRONIC
4 C23894 FET elec NaN
5 C23895 ELE SD Card adapter ELECTRONIC
6 C23896 Crystal 16Mhz NaN
7 C23897 Capacitor 100uF ELECTRONIC
8 C23898 ELECTRONICS Resistor 10K ELECTRONIC
9 C23899 M3x5 Socket Cap Bolt MECH MECHANICAL
10 C23900 M3x6 Socket Cap Bolt Mech MECHANICAL
11 C23901 Mehcanical Assemble Kapton Tape 120mm NaN
12 C23902 MK7 Filament Drive Block Front NaN
13 C23903 Pulley 5mm shaft NaN
如果只想匹配第一個單詞,則使用Series.map
,但首先要通過Series.str.lower
將兩個值都轉換為小寫:
dataframe2['ITEM/CODE'] = dataframe2['ITEM/CODE'].str.lower()
s = dataframe2.set_index('ITEM/CODE')['TYPE']
dataframe1['CLASSIFICATION'] = dataframe1['DESC'].str.split().str[0].str.lower().map(s)
print (dataframe1)
PN DESC CLASSIFICATION
0 C23890 Resistor 2.21K elec ELECTRONIC
1 C23891 Powerswitch NaN
2 C23892 Resistor 7.5K ELECTRONIC
3 C23893 Resistor .1K ELECTRONIC
4 C23894 FET elec NaN
5 C23895 ELE SD Card adapter ELECTRONIC
6 C23896 Crystal 16Mhz NaN
7 C23897 Capacitor 100uF ELECTRONIC
8 C23898 ELECTRONICS Resistor 10K ELECTRONIC
9 C23899 M3x5 Socket Cap Bolt MECH NaN
10 C23900 M3x6 Socket Cap Bolt Mech NaN
11 C23901 Mehcanical Assemble Kapton Tape 120mm NaN
12 C23902 MK7 Filament Drive Block Front NaN
13 C23903 Pulley 5mm shaft NaN
不那么花哨,但應該做的工作:
import pandas as pd
#convert dfType dataframe to dictionary
type_dict = dfType.set_index('ITEM/CODE').T.to_dict()
#function that takes in DESC column value and outputs corresponding value from type_dict
def map_type(in_str):
out_str = np.NaN
for val in in_str.split():
if val in type_dict.keys():
out_str = type_dict[val]['TYPE']
return out_str
#apply above function to DESC column
dfInput['CLASSIFICATION'] = dfInput['DESC'].apply(map_type)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.