[英]Python Pandas: How to groupby and compare columns
Here is my datafarme 'df':这是我的数据场“df”:
match name group
adamant Adamant Home Network 86
adamant ADAMANT, Ltd. 86
adamant bild TOV Adamant-Bild 86
360works 360WORKS 94
360works 360works.com 94
Per group number I want to compare the names one by one and see if they are matched to a same word from the 'match' column.对于每个组号,我想将名称一一比较,看看它们是否与“匹配”列中的同一个词匹配。
So desired output will be counts:因此所需的输出将是计数:
If they match we count it as 'TP' and if not we count it as 'FN'.
I had an idea of counting number of match words per group number but that would not help completely with what I want:我有一个计算每个组号的匹配词数的想法,但这对我想要的完全没有帮助:
df.groupby(group).count()
Does any body have an idea how to do it?有没有人知道怎么做?
If I understood well your question, this should do the work:如果我很好地理解了您的问题,这应该可以解决问题:
import re
import pandas
df = pandas.DataFrame([['adamant', 'Adamant Home Network', 86], ['adamant', 'ADAMANT, Ltd.', 86],
['adamant bild', "TOV Adamant-Bild", 86], ['360works', '360WORKS', 94],
['360works ', "360works.com ", 94]], columns=['match', 'name', 'group'])
def my_function(group):
for i, row in group.iterrows():
if ''.join(re.findall("[a-zA-Z]+", row['match'])).lower() not in ''.join(
re.findall("[a-zA-Z]+", row['name'])).lower():
# parsing the names in each columns and looking for an inclusion
# if one of the inclusion fails, we return 'FN'
return 'FN'
# if all inclusions succeed, we return 'TP'
return 'TP'
res_series = df.groupby('group').apply(my_function)
res_series.name = 'count'
res_df = res_series.reset_index()
print res_df
This will give you this DataFrame:这将为您提供此 DataFrame:
group count
1 86 'TP'
2 94 'TP'
This function will compare name and match columns by row, for each supplied group:对于每个提供的组,此函数将按行比较名称和匹配列:
def apply_func(df):
x = df['name'] == df['match']
return x.map({False:'FIN', True:'TP'})
In [683]: temp.join(temp.groupby('group').apply(apply_func).reset_index(), rsuffix='_1', how='left')
Out[683]:
match name group group_1 level_1 0
0 adamant Adamant Home Network 86 86 0 FIN
1 adamant ADAMANT, Ltd. 86 86 1 FIN
2 adamant bild TOV Adamant-Bild 86 86 2 FIN
3 360works 360WORKS 94 94 3 FIN
4 360works 360works.com 94 94 4 FIN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.