简体   繁体   中英

Python: Partial String matching in pandas column and retrieve the values from other columns in pandas dataframe

I have a string which is a file name as File_Name = 23092020_indent.xlsx

Now I have a dataframe as follows:

Id   fileKey      fileSource    fileStringLookup
10   rel_ind      sap_indent       indent
20   dm_material   sap_mm          mater
30   dm_vendor     sap_vm          vendor

Objective: Find the fileKey and fileSource where fileStringLookup matches with file name .

Exact match is not possible, hence we may set regex = True

for this I am using the following code snippets:

if tbl_master_file['fileStringLookup'].str.contains(File_Name,regex=True):
    File_Key = np.where(tbl_master_file['fileStringLookup'].str.contains(File_Name,regex=True),\
                        tbl_master_file['fileKey'],'')
    File_Source = np.where(tbl_master_file['fileStringLookup'].str.contains(File_Name,regex=True),\
                        tbl_master_file['fileSource'],'')

But this is not returning any value for File_Key and File_Source . Instead I am getting the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I investigated further to see whether df['fileStringLookup'].str.contains(File_Name,regex=True) is returning any value which is True . But it is returning False , even for the Id=10 !!

My desired output:

File_Key = 'rel_ind'
File_Source = 'sap_indent'

Am I missing out anything?

Your error is caused because your call to str.contains returns a Series of booleans, one for every element of the original Series. Thus, the if statement does not know what to check for, as a Series of booleans' truth value is ambiguous.

I would use pd.iterrows() inside a function, like :

def get_filekey_filesource(filename, df):
   return [{"fileSource": data.loc["fileSource"],
            "fileKey": data.loc["fileKey"]}
           if filename in data.loc["fileStringLookup"]
           else {}
           for index, data in df.iterrows()]

As you can see, this will return you a list of dictionnaries where the keys fileSource , fileKey hold their respective value for rows that match, or an empty dic where matching fails.

This looks far from ideal, but is the best i could come up with. Feedback welcome.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM