简体   繁体   中英

How to compare columns of two dataframes to add a mapping

I have two dataframes as shown below, and I'm trying to set value to CLASSIFICATION in dataframe1 based on dataframe2's ITEM/CODE. If DESC has any matching word to ITEM/CODE then I have to get TYPE from dataframe2.

To achieve this, I am splitting the DESC string, and trying to compare the list entries with ITEM/CODE of dataframe2. Any Ideas on how to do this?

dataframe1
PN      DESC                                        CLASSIFICATION
C23890  Resistor 2.21K elec
C23891  Powerswitch
C23892  Resistor 7.5K
C23893  Resistor .1K
C23894  FET elec
C23895  ELE SD Card adapter
C23896  Crystal 16Mhz
C23897  Capacitor 100uF
C23898  ELECTRONICS Resistor 10K
C23899  M3x5 Socket Cap Bolt MECH
C23900  M3x6 Socket Cap Bolt Mech
C23901  Mehcanical Assemble Kapton Tape 120mm
C23902  MK7 Filament Drive Block Front
C23903  Pulley 5mm shaft

dataframe2
ITEM/CODE      TYPE
ELE         ELECTRONIC
ELECTRONICS ELECTRONIC
Capacitor   ELECTRONIC
Resistor    ELECTRONIC
Washer      MECHANICAL
MECH        MECHANICAL

This is what I have coded up so far.


import pandas as pd

fn = 'D:\PartsExport.xlsx'
dfInput = pd.read_excel(fn, 'Sheet1')

fn_type = 'D:\TypeMaster.xlsx'
dfType = pd.read_excel(fn_type, 'Sheet1')

dfInput['DESC_SPLIT'] = dfInput["DESC"].str.split(" ", n=-1, expand = False)

Result

PN      DESC                      CLASSIFICATION
C23890  Resistor 2.21K elec         ELECTRONIC
C23891  Powerswitch                 ELECTRONIC
C23892  Resistor 7.5K               ELECTRONIC
C23893  Resistor .1K                ELECTRONIC
C23899  M3x5 Socket Cap Bolt MECH   MECHANICAL

Use Series.str.contains with looping by Series created by dataframe2 , flags=re.I parameter is for non case matching:

import re

for k, v in dataframe2.set_index('ITEM/CODE')['TYPE'].items():
    #if necessary word boundaries
    pat = r"\b{}\b".format(k)
    #if not
    #pat = k
    dataframe1.loc[dataframe1['DESC'].str.contains(pat, flags=re.I), 'CLASSIFICATION'] = v

print (dataframe1)
        PN                                   DESC CLASSIFICATION
0   C23890                    Resistor 2.21K elec     ELECTRONIC
1   C23891                            Powerswitch            NaN
2   C23892                          Resistor 7.5K     ELECTRONIC
3   C23893                           Resistor .1K     ELECTRONIC
4   C23894                               FET elec            NaN
5   C23895                    ELE SD Card adapter     ELECTRONIC
6   C23896                          Crystal 16Mhz            NaN
7   C23897                        Capacitor 100uF     ELECTRONIC
8   C23898               ELECTRONICS Resistor 10K     ELECTRONIC
9   C23899              M3x5 Socket Cap Bolt MECH     MECHANICAL
10  C23900              M3x6 Socket Cap Bolt Mech     MECHANICAL
11  C23901  Mehcanical Assemble Kapton Tape 120mm            NaN
12  C23902         MK7 Filament Drive Block Front            NaN
13  C23903                       Pulley 5mm shaft            NaN

If want match only first word use Series.map , but first convert both values to lowercase by Series.str.lower :

dataframe2['ITEM/CODE'] = dataframe2['ITEM/CODE'].str.lower()
s = dataframe2.set_index('ITEM/CODE')['TYPE']

dataframe1['CLASSIFICATION'] = dataframe1['DESC'].str.split().str[0].str.lower().map(s)
print (dataframe1)
        PN                                   DESC CLASSIFICATION
0   C23890                    Resistor 2.21K elec     ELECTRONIC
1   C23891                            Powerswitch            NaN
2   C23892                          Resistor 7.5K     ELECTRONIC
3   C23893                           Resistor .1K     ELECTRONIC
4   C23894                               FET elec            NaN
5   C23895                    ELE SD Card adapter     ELECTRONIC
6   C23896                          Crystal 16Mhz            NaN
7   C23897                        Capacitor 100uF     ELECTRONIC
8   C23898               ELECTRONICS Resistor 10K     ELECTRONIC
9   C23899              M3x5 Socket Cap Bolt MECH            NaN
10  C23900              M3x6 Socket Cap Bolt Mech            NaN
11  C23901  Mehcanical Assemble Kapton Tape 120mm            NaN
12  C23902         MK7 Filament Drive Block Front            NaN
13  C23903                       Pulley 5mm shaft            NaN

Not as fancy, but should do the job:

import pandas as pd

#convert dfType dataframe to dictionary
type_dict = dfType.set_index('ITEM/CODE').T.to_dict()

#function that takes in DESC column value and outputs corresponding value from type_dict
def map_type(in_str):
  out_str = np.NaN
  for val in in_str.split():
      if val in type_dict.keys():
         out_str = type_dict[val]['TYPE']
  return out_str

#apply above function to DESC column
dfInput['CLASSIFICATION'] = dfInput['DESC'].apply(map_type)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM