Python Pandas：如何分组和比较列

Question

Here is my datafarme 'df':这是我的数据场“df”：

match           name                   group  
adamant         Adamant Home Network   86   
adamant         ADAMANT, Ltd.          86   
adamant bild    TOV Adamant-Bild       86   
360works        360WORKS               94   
360works        360works.com           94

Per group number I want to compare the names one by one and see if they are matched to a same word from the 'match' column.对于每个组号，我想将名称一一比较，看看它们是否与“匹配”列中的同一个词匹配。

So desired output will be counts:因此所需的输出将是计数：

 If they match we count it as 'TP' and if not we count it as 'FN'.

I had an idea of counting number of match words per group number but that would not help completely with what I want:我有一个计算每个组号的匹配词数的想法，但这对我想要的完全没有帮助：

df.groupby(group).count()

Does any body have an idea how to do it?有没有人知道怎么做？

Answer 1

If I understood well your question, this should do the work:如果我很好地理解了您的问题，这应该可以解决问题：

import re
import pandas


df = pandas.DataFrame([['adamant', 'Adamant Home Network', 86], ['adamant', 'ADAMANT, Ltd.', 86],
                       ['adamant bild', "TOV Adamant-Bild", 86], ['360works', '360WORKS', 94],
                       ['360works ', "360works.com ", 94]], columns=['match', 'name', 'group'])


def my_function(group):
    for i, row in group.iterrows():
        if ''.join(re.findall("[a-zA-Z]+", row['match'])).lower() not in ''.join(
                re.findall("[a-zA-Z]+", row['name'])).lower():
            # parsing the names in each columns and looking for an inclusion
            # if one of the inclusion fails, we return 'FN'
            return 'FN'
    # if all inclusions succeed, we return 'TP'
    return 'TP'


res_series = df.groupby('group').apply(my_function)
res_series.name = 'count'
res_df = res_series.reset_index()
print res_df

This will give you this DataFrame:这将为您提供此 DataFrame：

     group     count
1    86        'TP'
2    94        'TP'

Answer 2

This function will compare name and match columns by row, for each supplied group:对于每个提供的组，此函数将按行比较名称和匹配列：

def apply_func(df):
    x = df['name'] == df['match']
    return x.map({False:'FIN', True:'TP'})

In [683]: temp.join(temp.groupby('group').apply(apply_func).reset_index(), rsuffix='_1', how='left')
Out[683]: 
           match                  name  group  group_1  level_1    0
0        adamant  Adamant Home Network     86       86        0  FIN
1        adamant         ADAMANT, Ltd.     86       86        1  FIN
2  adamant bild       TOV Adamant-Bild     86       86        2  FIN
3       360works              360WORKS     94       94        3  FIN
4       360works          360works.com     94       94        4  FIN

Python Pandas：如何分组和比较列

问题描述

2 个解决方案

解决方案1
1 已采纳 2015-04-29 15:17:13

解决方案2
1 2015-04-29 16:12:11

Python Pandas：如何分组和比较列

问题描述

2 个解决方案

解决方案1 1 已采纳 2015-04-29 15:17:13

解决方案2 1 2015-04-29 16:12:11

解决方案1
1 已采纳 2015-04-29 15:17:13

解决方案2
1 2015-04-29 16:12:11