查找两个数据帧之间的字符串匹配

Question

I have a DataFrame as below. 我有一个DataFrame如下。

DF1: DF1：

   A
Any Match
Credit
I need a debit card.
Logging
Awesome

I have another DataFrame as below: 我有另一个DataFrame，如下所示：

DF2: DF2：

          B
I did not find any match.
I want a credit card.
I need a debit card.
I do not know.
I am logging into credit portal.

I need my output as: 我需要我的输出为：

              B                           A
     I did not find any match.        Any Match
     I want a credit card.            Credit
     I need a debit card.             I need a debit card.
     I am logging into credit portal. logging,credit

Here if the phrase present in the DF1 is in any of the text present in DF2. 在此，如果DF1中存在的短语在DF2中存在的任何文本中。 Print the o/p as Text and important phrase. 将o / p打印为文本和重要短语。

Answer 1

Try Fuzzywuzzy : 尝试Fuzzywuzzy ：

import pandas as pd
from fuzzywuzzy import fuzz

matched_entities = []

for row in df1.index:
    name1 = vendor_df.get_value(row,"A")
    for columns in df2.index:
        name2=df2.get_value(columns,"B")
        matched_token=fuzz.partial_ratio(name1,name2)
        if matched_token> 80:
            matched_vendors.append([A,B])

df_partial_ratio = pd.DataFrame(columns=['A', 'B'], data=matched_entities)

In your DB if fuzz.partial_ratio doesn't work try fuzz.ratio or fuzz.token_sort_ratio . 在数据库中，如果fuzz.partial_ratio不起作用，请尝试fuzz.ratio或fuzz.token_sort_ratio 。 These two can be implemented by changing one line of above code by below codes: 可以通过将以下代码中的一行更改为以下代码来实现这两项：

matched_token=fuzz.ratio(name1,name2)

OR 要么

matched_token=fuzz.token_sort_ratio(name1,name2)

Answer 2

You could do something like this. 你可以做这样的事情。 First, define a lookup function that matches "normalized" text, eg lowercased: 首先，定义一个与“规范化”文本匹配的查找函数，例如小写：

def lookup(x, values):
    for value in values:
        if value.lower() in x.lower():
            return value

Then apply this function to your DF2: 然后将此功能应用于DF2：

dfB['A'] = dfB['B'].apply(lambda x: lookup(x, dfA['A']))

Which should give you: 哪个应该给你：

    B                           A
0   I did not find any match.   Any Match
1   I want a credit card.       Credit
2   I need a debit card.        Debit
3   I do not know.              None

Answer 3

try this 尝试这个

df1['B'] = float('nan')

pos = 0
for i in range(len(df1)):
    for j in range(len(df2)):
        if df1['A'][i].lower() in df2['B'][j].lower():
            df1['B'].iloc[pos] = df2['B'][j]
            pos+=1
            break

df1.dropna(axis=0)

output 输出

                     A                          B
0            Any Match  I did not find any match.
1               Credit      I want a credit card.
2  I need a debit card       I need a debit card.

查找两个数据帧之间的字符串匹配

问题描述

3 个解决方案

解决方案1
0 2018-10-11 11:57:32

解决方案2
0 2018-10-11 11:58:36

解决方案3
0 2018-10-11 12:00:37

查找两个数据帧之间的字符串匹配

问题描述

3 个解决方案

解决方案1 0 2018-10-11 11:57:32

解决方案2 0 2018-10-11 11:58:36

解决方案3 0 2018-10-11 12:00:37

解决方案1
0 2018-10-11 11:57:32

解决方案2
0 2018-10-11 11:58:36

解决方案3
0 2018-10-11 12:00:37