在数据框中搜索匹配的子字符串

Question

I am trying to use my df as a lookup table, and trying to determine if my string contains a value in that df.我正在尝试使用我的 df 作为查找表，并尝试确定我的字符串是否包含该 df 中的值。 Simple example简单的例子

str = 'John Smith Business Analyst'
df = pd.read_pickle('job_titles.pickle')

The df would be one column with several job titles. df 将是具有多个职位的一列。

df = accountant, lawyer, CFO, business analyst, etc.. df = 会计师、律师、CFO、业务分析师等。

Now, somehow be able to determine that str has a substring: Business Analyst, because that value is contained in my df.现在，以某种方式能够确定 str 有一个子字符串：Business Analyst，因为该值包含在我的 df 中。

The return result would be the substring = 'Business Analyst'返回结果将是子字符串 = 'Business Analyst'

If the original str was:如果原始 str 是：

str = 'John Smith Business' str = '约翰史密斯商业'

Then the return would be empty since no substring matches a string in the df.然后返回将为空，因为没有子字符串与 df 中的字符串匹配。

I have it working if it is for one word.如果是一个词，我让它工作。 For example:例如：

df = pd.read_pickle('cities.pickle')
df = Calgary, Edmonton, Toronto, etc


str = 'John Smith Business Analyst Calgary AB Canada'
str_list = str.split()

for word in str_list:
    df_location = df[df['name'].str.match(word)]
    if not df_location.empty: 
        break

df_location = Calgary

The city will be found in the df, and return that one row.城市将在 df 中找到，并返回那一行。 Just not sure how when it is more than one word.只是不确定当它超过一个词时如何。

Answer 1

I am not sure what you want to do with the returned value exactly, but here is a way to identify it at least.我不确定你想对返回的值做些什么，但这里至少有一种识别它的方法。 First, I made a toy dataframe:首先，我制作了一个玩具数据框：

import pandas as pd

titles_df = pd.DataFrame({'title' : ['Business Analyst', 'Data Scientist', 'Plumber', 'Baker', 'Accountant', 'CEO']})

search_name = 'John Smith Business Analyst'

titles_df

              title
0  Business Analyst
1    Data Scientist
2           Plumber
3             Baker
4        Accountant
5               CEO

Then, I loop through the values in the title column to see if any of them are in the search term:然后，我遍历title列中的值以查看它们中是否有任何在搜索词中：

for val in titles_df['title'].values:
    if val in search_name:
        print(val)

If you want to do this over all the names in a dataframe column and assign a new column with the title you can do the following:如果要对数据框列中的所有名称执行此操作并分配带有标题的新列，您可以执行以下操作：

First, I create a dataframe with some names:首先，我创建了一个具有一些名称的数据框：

names_df = pd.DataFrame({'name' : ['John Smith Business Analyst', 'Dorothy Roberts CEO', 'Jim Miller Dancer', 'Samuel Adams Accountant']})

Then, I loop through the values of names and values of titles and assign the matched titles to a title column in the names dataframe (unmatched ones will have an empty string):然后，我遍历名称的值和标题的值，并将匹配的标题分配给名称数据框中的标题列（不匹配的标题将具有空字符串）：

names_df['title'] = ''
for name in names_df['name'].values: 
    for title in titles_df['title'].values:
        if title in name:
            names_df['title'][names_df['name'] == name] = title

names_df
                          name             title
0  John Smith Business Analyst  Business Analyst
1          Dorothy Roberts CEO               CEO
2            Jim Miller Dancer                  
3      Samuel Adams Accountant        Accountant

在数据框中搜索匹配的子字符串

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-11-25 10:34:26

在数据框中搜索匹配的子字符串

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-11-25 10:34:26

解决方案1
1 已采纳 2019-11-25 10:34:26