[英]Python: Partial String matching in pandas column and retrieve the values from other columns in pandas dataframe
I have a string which is a file name
as File_Name = 23092020_indent.xlsx
我有一个字符串,它是一个
file name
File_Name = 23092020_indent.xlsx
Now I have a dataframe as follows:现在我有一个数据框如下:
Id fileKey fileSource fileStringLookup
10 rel_ind sap_indent indent
20 dm_material sap_mm mater
30 dm_vendor sap_vm vendor
Objective: Find the fileKey
and fileSource
where fileStringLookup
matches with file name
.目标:找到
fileKey
和fileSource
,其中fileStringLookup
与file name
匹配。
Exact match is not possible, hence we may set regex = True
完全匹配是不可能的,因此我们可以设置
regex = True
for this I am using the following code snippets:为此,我使用以下代码片段:
if tbl_master_file['fileStringLookup'].str.contains(File_Name,regex=True):
File_Key = np.where(tbl_master_file['fileStringLookup'].str.contains(File_Name,regex=True),\
tbl_master_file['fileKey'],'')
File_Source = np.where(tbl_master_file['fileStringLookup'].str.contains(File_Name,regex=True),\
tbl_master_file['fileSource'],'')
But this is not returning any value for File_Key
and File_Source
.但这不会为
File_Key
和File_Source
返回任何值。 Instead I am getting the following error:相反,我收到以下错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I investigated further to see whether df['fileStringLookup'].str.contains(File_Name,regex=True)
is returning any value which is True
.我进一步调查了
df['fileStringLookup'].str.contains(File_Name,regex=True)
是否返回任何值为True
值。 But it is returning False
, even for the Id=10
!!但它返回
False
,即使对于Id=10
!
My desired output:我想要的输出:
File_Key = 'rel_ind'
File_Source = 'sap_indent'
Am I missing out anything?我错过了什么吗?
Your error is caused because your call to str.contains
returns a Series of booleans, one for every element of the original Series.您的错误是因为您对
str.contains
的调用返回一系列布尔值,原始系列的每个元素一个。 Thus, the if
statement does not know what to check for, as a Series of booleans' truth value is ambiguous.因此,
if
语句不知道要检查什么,因为一系列布尔值的真值是不明确的。
I would use pd.iterrows()
inside a function, like :我会在函数中使用
pd.iterrows()
,例如:
def get_filekey_filesource(filename, df):
return [{"fileSource": data.loc["fileSource"],
"fileKey": data.loc["fileKey"]}
if filename in data.loc["fileStringLookup"]
else {}
for index, data in df.iterrows()]
As you can see, this will return you a list of dictionnaries where the keys fileSource
, fileKey
hold their respective value for rows that match, or an empty dic where matching fails.如您所见,这将返回一个字典列表,其中键
fileSource
、 fileKey
保存它们各自匹配行的值,或者匹配失败的空 dic 。
This looks far from ideal, but is the best i could come up with.这看起来远非理想,但这是我能想出的最好的。 Feedback welcome.
欢迎反馈。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.