简体   繁体   English

Python:熊猫列中的部分字符串匹配并从熊猫数据框中的其他列中检索值

[英]Python: Partial String matching in pandas column and retrieve the values from other columns in pandas dataframe

I have a string which is a file name as File_Name = 23092020_indent.xlsx我有一个字符串,它是一个file name File_Name = 23092020_indent.xlsx

Now I have a dataframe as follows:现在我有一个数据框如下:

Id   fileKey      fileSource    fileStringLookup
10   rel_ind      sap_indent       indent
20   dm_material   sap_mm          mater
30   dm_vendor     sap_vm          vendor

Objective: Find the fileKey and fileSource where fileStringLookup matches with file name .目标:找到fileKeyfileSource ,其中fileStringLookupfile name匹配。

Exact match is not possible, hence we may set regex = True完全匹配是不可能的,因此我们可以设置regex = True

for this I am using the following code snippets:为此,我使用以下代码片段:

if tbl_master_file['fileStringLookup'].str.contains(File_Name,regex=True):
    File_Key = np.where(tbl_master_file['fileStringLookup'].str.contains(File_Name,regex=True),\
                        tbl_master_file['fileKey'],'')
    File_Source = np.where(tbl_master_file['fileStringLookup'].str.contains(File_Name,regex=True),\
                        tbl_master_file['fileSource'],'')

But this is not returning any value for File_Key and File_Source .但这不会为File_KeyFile_Source返回任何值。 Instead I am getting the following error:相反,我收到以下错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I investigated further to see whether df['fileStringLookup'].str.contains(File_Name,regex=True) is returning any value which is True .我进一步调查了df['fileStringLookup'].str.contains(File_Name,regex=True)是否返回任何值为True值。 But it is returning False , even for the Id=10 !!但它返回False ,即使对于Id=10

My desired output:我想要的输出:

File_Key = 'rel_ind'
File_Source = 'sap_indent'

Am I missing out anything?我错过了什么吗?

Your error is caused because your call to str.contains returns a Series of booleans, one for every element of the original Series.您的错误是因为您对str.contains的调用返回一系列布尔值,原始系列的每个元素一个。 Thus, the if statement does not know what to check for, as a Series of booleans' truth value is ambiguous.因此, if语句不知道要检查什么,因为一系列布尔值的真值是不明确的。

I would use pd.iterrows() inside a function, like :我会在函数中使用pd.iterrows() ,例如:

def get_filekey_filesource(filename, df):
   return [{"fileSource": data.loc["fileSource"],
            "fileKey": data.loc["fileKey"]}
           if filename in data.loc["fileStringLookup"]
           else {}
           for index, data in df.iterrows()]

As you can see, this will return you a list of dictionnaries where the keys fileSource , fileKey hold their respective value for rows that match, or an empty dic where matching fails.如您所见,这将返回一个字典列表,其中键fileSourcefileKey保存它们各自匹配行的值,或者匹配失败的空 dic 。

This looks far from ideal, but is the best i could come up with.这看起来远非理想,但这是我能想出的最好的。 Feedback welcome.欢迎反馈。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将数据框列中的字符串替换为其他列 Pandas 中的值 - Replace a string in dataframe's column with values from other columns Pandas Pandas Dataframe 中的部分字符串匹配 - Partial String Matching in Pandas Dataframe 从python pandas的dataframe列中搜索匹配的字符串模式 - searching matching string pattern from dataframe column in python pandas 根据来自其他 Pandas 数据帧的匹配列更新 Pandas 列的最快方法 - Fastest way to update pandas columns based on matching column from other pandas dataframe python pandas dataframe从其他列的单元格创建新列 - python pandas dataframe create new column from other columns' cells 熊猫通过将数据框列与其他多个列进行匹配来生成列 - Pandas generates a column based by matching the dataframe columns to multiple other columns 如果其他两个列在Pandas中具有匹配的值,如何用另一个数据框的值填充空列的值? - How to fill empty column values with another dataframe's value if two other columns have matching values in Pandas? Pandas:用于匹配行索引 - 使用不同列大小的其他 dataframe 的值更新 dataframe 值 - Pandas: for matching row indices - update dataframe values with values from other dataframe with a different column size python:pandas:按条件将其他 dataframe 的值添加到新列中 - python: pandas: add values from other dataframe into new column by condition 根据与其他列名称匹配的列值填充 Pandas Dataframe - Populate Pandas Dataframe Based on Column Values Matching Other Column Names
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM