簡體   English   中英

如何比較兩個數據框的列以添加映射

[英]How to compare columns of two dataframes to add a mapping

我有兩個數據幀,如下所示,並且我正在嘗試根據數據幀2的ITEM / CODE將值設置為數據幀1中的分類。 如果DESC有與ITEM / CODE匹配的單詞,那么我必須從dataframe2獲取TYPE。

為此,我拆分了DESC字符串,並嘗試將列表條目與dataframe2的ITEM / CODE比較。 有關如何執行此操作的任何想法?

dataframe1
PN      DESC                                        CLASSIFICATION
C23890  Resistor 2.21K elec
C23891  Powerswitch
C23892  Resistor 7.5K
C23893  Resistor .1K
C23894  FET elec
C23895  ELE SD Card adapter
C23896  Crystal 16Mhz
C23897  Capacitor 100uF
C23898  ELECTRONICS Resistor 10K
C23899  M3x5 Socket Cap Bolt MECH
C23900  M3x6 Socket Cap Bolt Mech
C23901  Mehcanical Assemble Kapton Tape 120mm
C23902  MK7 Filament Drive Block Front
C23903  Pulley 5mm shaft

dataframe2
ITEM/CODE      TYPE
ELE         ELECTRONIC
ELECTRONICS ELECTRONIC
Capacitor   ELECTRONIC
Resistor    ELECTRONIC
Washer      MECHANICAL
MECH        MECHANICAL

到目前為止,這就是我編寫的代碼。


import pandas as pd

fn = 'D:\PartsExport.xlsx'
dfInput = pd.read_excel(fn, 'Sheet1')

fn_type = 'D:\TypeMaster.xlsx'
dfType = pd.read_excel(fn_type, 'Sheet1')

dfInput['DESC_SPLIT'] = dfInput["DESC"].str.split(" ", n=-1, expand = False)

Result

PN      DESC                      CLASSIFICATION
C23890  Resistor 2.21K elec         ELECTRONIC
C23891  Powerswitch                 ELECTRONIC
C23892  Resistor 7.5K               ELECTRONIC
C23893  Resistor .1K                ELECTRONIC
C23899  M3x5 Socket Cap Bolt MECH   MECHANICAL

使用Series.str.contains與由創建系列循環dataframe2flags=re.I參數是不匹配的情況下:

import re

for k, v in dataframe2.set_index('ITEM/CODE')['TYPE'].items():
    #if necessary word boundaries
    pat = r"\b{}\b".format(k)
    #if not
    #pat = k
    dataframe1.loc[dataframe1['DESC'].str.contains(pat, flags=re.I), 'CLASSIFICATION'] = v

print (dataframe1)
        PN                                   DESC CLASSIFICATION
0   C23890                    Resistor 2.21K elec     ELECTRONIC
1   C23891                            Powerswitch            NaN
2   C23892                          Resistor 7.5K     ELECTRONIC
3   C23893                           Resistor .1K     ELECTRONIC
4   C23894                               FET elec            NaN
5   C23895                    ELE SD Card adapter     ELECTRONIC
6   C23896                          Crystal 16Mhz            NaN
7   C23897                        Capacitor 100uF     ELECTRONIC
8   C23898               ELECTRONICS Resistor 10K     ELECTRONIC
9   C23899              M3x5 Socket Cap Bolt MECH     MECHANICAL
10  C23900              M3x6 Socket Cap Bolt Mech     MECHANICAL
11  C23901  Mehcanical Assemble Kapton Tape 120mm            NaN
12  C23902         MK7 Filament Drive Block Front            NaN
13  C23903                       Pulley 5mm shaft            NaN

如果只想匹配第一個單詞,則使用Series.map ,但首先要通過Series.str.lower將兩個值都轉換為小寫:

dataframe2['ITEM/CODE'] = dataframe2['ITEM/CODE'].str.lower()
s = dataframe2.set_index('ITEM/CODE')['TYPE']

dataframe1['CLASSIFICATION'] = dataframe1['DESC'].str.split().str[0].str.lower().map(s)
print (dataframe1)
        PN                                   DESC CLASSIFICATION
0   C23890                    Resistor 2.21K elec     ELECTRONIC
1   C23891                            Powerswitch            NaN
2   C23892                          Resistor 7.5K     ELECTRONIC
3   C23893                           Resistor .1K     ELECTRONIC
4   C23894                               FET elec            NaN
5   C23895                    ELE SD Card adapter     ELECTRONIC
6   C23896                          Crystal 16Mhz            NaN
7   C23897                        Capacitor 100uF     ELECTRONIC
8   C23898               ELECTRONICS Resistor 10K     ELECTRONIC
9   C23899              M3x5 Socket Cap Bolt MECH            NaN
10  C23900              M3x6 Socket Cap Bolt Mech            NaN
11  C23901  Mehcanical Assemble Kapton Tape 120mm            NaN
12  C23902         MK7 Filament Drive Block Front            NaN
13  C23903                       Pulley 5mm shaft            NaN

不那么花哨,但應該做的工作:

import pandas as pd

#convert dfType dataframe to dictionary
type_dict = dfType.set_index('ITEM/CODE').T.to_dict()

#function that takes in DESC column value and outputs corresponding value from type_dict
def map_type(in_str):
  out_str = np.NaN
  for val in in_str.split():
      if val in type_dict.keys():
         out_str = type_dict[val]['TYPE']
  return out_str

#apply above function to DESC column
dfInput['CLASSIFICATION'] = dfInput['DESC'].apply(map_type)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM