简体   繁体   English

如何将一个数据框的每一行与另一数据框的所有行进行比较,并计算距离度量?

[英]How to compare each row from one dataframe against all the rows from other dataframe and calculate distance measure?

I have two different customer dataframes and I would like to match them based on Jaccard distance matrix or any other method. 我有两个不同的客户数据框,我想根据Jaccard距离矩阵或任何其他方法来匹配它们。

df1 df1

 Name     country            cost
    0    raj  Kazakhstan     23
    1    sam      Russia     243
    2  kanan     Belarus     2
    3    Nan         Nan     0

df2 df2

   Name     country   DOB
0   rak  Kazakhstan   12-12-1903
1   sim      russia   03-04-1994
2   raj     Belarus   21-09-2003
3  kane     Belarus   23-12-1999

Output: 输出:

if the string comparison value is greater than >0.6, I would like to combine both the rows in the new dataframe. 如果字符串比较值大于> 0.6,我想合并新数据框中的两行。

Df3 Df3

    Name     country   Name  country     cost   DOB
0    raj  Kazakhstan   rak   Kazakhstan  23     12-12-1903
1    sam      Russia   sim   russia      243    03-04-1994
2  kanan     Belarus   Kane  Belarus     2      23-12-1999

I had tried doing calculating each row against each row but don't how to compare each rows against entire rows from one to other dataframe? 我曾尝试对每一行进行每一行的计算,但不比较每一行与另一行中的整个行之间的比较吗?

I would like using fuzzywuzzy 我想使用fuzzywuzzy

from fuzzywuzzy import process

df1['key'] = df1.sum(1)
df2['key'] = df2.sum(1)


def yoursource(x):
    if [process.extract(x, df2.key.tolist(), limit=1)][0][0][1]>60:
        return [process.extract(x, df2.key.tolist(), limit=1)][0][0][0]
    else :
        return 'notmatch'

df1['key'] = df1.key.apply(yoursource)

After that we get the match key using merge 之后,我们使用merge获得匹配键

df = df1.merge(df2, on='key', how='inner').drop('key',1)
df
  Name_x   country_x Name_y   country_y
0    raj  Kazakhstan    rak  Kazakhstan
1    sam      Russia    sim      russia
2  kanan     Belarus   kane     Belarus

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将数据帧的一行与其他数据帧的行进行比较? - Compare one row of a dataframe with rows of other dataframe? 如何将一行的值与所有其他行进行比较? - How do I compare value from one row against all other rows? 将每一行与数据框中的其他行进行比较 - Compare each row to the other rows in a Dataframe 如何将 dataframe 中的每一行与另一个 dataframe 中的每一行进行比较,并查看值之间的差异? - How can I compare each row from a dataframe against every row from another dataframe and see the difference between values? 如何计算行的列值与 dataframe 中具有多个值的所有其他行的差异? 迭代每一行 - How to calculate the difference of a row's column values against all other rows with multiple values in a dataframe? Iterate for every row 如何将值从pandas Dataframe中的一行传播到所有其他行 - How to propagate values from one row in a pandas Dataframe to all other rows 如何迭代一个 dataframe 的每一行并与 Python 中的另一个 dataframe 中的行进行比较? - how to iterate each row of one dataframe and compare with rows in another dataframe in Python? 如何使用另一个数据帧中的行减去数据框中的所有行? - How to subtract all rows in a dataframe with a row from another dataframe? 从 pandas dataframe 中同一字段的所有其他行中减去一行字段中的值 - Subtract the value in a field in one row from all other rows of the same field in pandas dataframe 从0到数据帧长度(负1)的所有n,如何获取距离为n的每对行之间的最小比率? - How can I get the minimum ratio between of each pair of rows with distance n, for all n from 0 up to the length of the dataframe (minus 1)?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM