简体   繁体   English

如果数据框存在于另一个数据框列中,则搜索它的子字符串

[英]Searching substring of a dataframe if it exists in another dataframe column

I need some help with searching a string or a substring in the chemicals column in dataframe1 and checking to see if it exists in dataframe2 then creating a new column in dataframe1 to return the corresponding chemical name column from dataframe2.我需要一些帮助来搜索 dataframe1 的化学列中的字符串或子字符串并检查它是否存在于 dataframe2 中,然后在 dataframe1 中创建一个新列以从 dataframe2 返回相应的化学名称列。 Can anyone assist?有人可以帮忙吗?

Thanks谢谢

I'm not 100% sure your question is clear, however I've made an attempt at what you're saying you're trying to do with an example.我不是 100% 确定你的问题很清楚,但是我已经尝试过你所说的你试图用一个例子做的事情。 Here we search each element in df1, and then return a list of every match in df2 in a new column in df1.这里我们搜索 df1 中的每个元素,然后在 df1 的新列中返回 df2 中每个匹配项的列表。 Let me know if this is what you're expecting:让我知道这是否是您所期望的:

df1 = pd.DataFrame({'CHEMICALS': ['AAA', 'BBB', 'ccc'],
                   'label': [0.0, 1.0, 0.0]
                   })
df2 = pd.DataFrame({'CHEMICALS': ['DDD', 'BBB_2', 'ccc_2', 'ccc_3'],
                   })

for ind1 in df1.index:
    df1.loc[ind1, 'df2_match'] = ', '.join(list(df2[df2['CHEMICALS'].str.contains(df1['CHEMICALS'][ind1])]['CHEMICALS']))

To break this up a little bit:稍微分解一下:

x1 = df2['CHEMICALS'].str.contains(df1['CHEMICALS'][ind1])

this returns a TRUE/FALSE series for if an item in df2 contains the string at position ind1 in df1.如果 df2 中的项目包含 df1 中位置 ind1 的字符串,这将返回一个 TRUE/FALSE 系列。

x2 = df2[x1]['CHEMICALS']

This returns the name of each CHEMICAL in df2 at the positions specified by the TRUE/FALSE series.这将在 TRUE/FALSE 系列指定的位置返回 df2 中每个 CHEMICAL 的名称。

x3 = ', '.join(list(x2))

This then turns those names as a list and then joins them together with a ', ' inbetween.然后,这会将这些名称转换为列表,然后将它们与中间的 ', ' 连接在一起。 This is then stuck in the new column of df1 at the correct index, and repeated for each chemical in df1.然后将其卡在 df1 的新列中的正确索引处,并对 df1 中的每种化学物质重复。

The output looks like this:输出如下所示:

df1
    CHEMICALS   label   df2_match
0   AAA         0.0 
1   BBB         1.0     BBB_2
2   ccc         0.0     ccc_2, ccc_3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何检查熊猫数据帧列中的子字符串是否存在于同一数据帧中另一列的子字符串中? - How to check if a substring in a pandas dataframe column exists in a substring of another column in the same dataframe? 在数据框中搜索匹配的子字符串 - searching substring for match in dataframe 在数据框中搜索子字符串并将其替换 - Searching for a substring in a dataframe and replacing it 将一个数据框列的出现计为另一个中的子字符串? - Counting the occurrence of one dataframe column as a substring in another? 根据pandas数据框中的另一列获取子字符串 - Getting substring based on another column in a pandas dataframe Map 通过搜索另一个新列的值 dataframe - Map the value of a new column by searching another dataframe 如何在一个 Pandas 数据帧列中搜索字符串作为另一个数据帧中的子字符串 - How to search a string in one pandas dataframe column as a substring in another dataframe 从另一个 dataframe 的列中提取包含 substring 的 dataframe 中的所有行 - Extracting all the rows in a dataframe that contains a substring from a column in another dataframe 检查另一个数据框列中是否存在数据框列中的少数值 - To check if few values in dataframe column exists in another dataframe column 检查一个 dataframe 中的列对是否存在于另一个中? - Check if column pair in one dataframe exists in another?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM