简体   繁体   English

2个数据框之间的字符串匹配

[英]String matching between 2 dataframe

Learning Python here, and any help on this is much appreciated. 在这里学习Python,对此深有帮助。 My problem scenario is, there are 2 dataframes A and B contains a column(Name and Flag) list of Names. 我的问题场景是,有2个数据框AB包含名称的列(名称和标志)列表。

ExDF = pd.DataFrame({'Name' : ['Smith','John, Alex','Peter Lin','Carl Marx','Abhraham Moray','Calvin Klein'], 'Flag':['False','False','False','False','False','False']})

SnDF = pd.DataFrame({'Name' : ['Adam K ','John Smith','Peter Lin','Carl Josh','Abhraham Moray','Tim Klein'], 'Flag':['False','False','False','False','False','False']})

The initial value of Flag is False. Flag的初始值为False。

Point 1: I need to flip the names in both dataframe ie. 要点1:我需要在两个数据框中都翻转名称。 Adam Smith to Smith Adam and save the flip names in another new column in the both dataframes. 亚当·史密斯(Adam Smith)和史密斯·亚当(Smith Adam),并将翻转名称保存在两个数据框中的另一个新列中。 - This part is done. -这部分完成了。

Point 2: Then both the Original name and flip names of A dataframe should get check in B dataframe original names and flip names. 第2点:然后, A数据帧的原始名称和翻转名称都应签入B数据帧的原始名称和翻转名称。 If it found the the flag column in both the dataframe should get update by True. 如果找到两个数据帧中的标志列,则应通过True更新。

I wrote the code but it checks one on one row to both dataframe like A[0] to B[0] , A[1] to B[1] , but i need to check A[0] record to all the records of B dataframe. 我编写了代码,但是它同时检查了两个数据帧,如A[0]B[0]A[1]B[1] ,但我需要检查A[0]记录到的所有记录B数据框。

Pls help me on this!! 请帮助我!

The code which tried is below: 尝试的代码如下:

import numpy as np

import pandas as pd

from sklearn.feature_extraction.text import CountVectorizer

ExDF_swap = ExDF["Swap"] = ExDF["Name"].apply(lambda x: " ".join(reversed(x.split()))) 
SnDF_swap = SnDF["Swap"] = SnDF["Name"].apply(lambda x: " ".join(reversed(x.split()))) 
ExDF_swap =  pd.DataFrame(ExDF_swap)
SnDF_swap =  pd.DataFrame(SnDF_swap)

vect = CountVectorizer()
X = vect.fit_transform(ExDF_swap.Name)
Y = vect.transform(SnDF_swap.Name)

res = np.ravel(np.any((X.dot(Y.T) > 1).todense(), axis=1))
pd.DataFrame(X.toarray(), columns=vect.get_feature_names())
pd.DataFrame(Y.toarray(), columns=vect.get_feature_names())

ExDF["Flag"] = np.ravel(np.any((X.dot(Y.T) > 1).todense(), axis=1))
SnDF["Flag"] = np.ravel(np.any((X.dot(Y.T) > 1).todense(), axis=1))

You could try isin() - of pandas: 您可以尝试熊猫的isin() -:

import pandas as pd

ExDF = pd.DataFrame({'Name' : ['Smith','John, Alex','Peter Lin','Carl Marx','Abhraham Moray','Calvin Klein'], 'Flag':['False','False','False','False','False','False']})
SnDF = pd.DataFrame({'Name' : ['Adam K ','John Smith','Peter Lin','Carl Josh','Abhraham Moray','Tim Klein'], 'Flag':['False','False','False','False','False','False']})

print(ExDF)
print(SnDF)

ExDF["Swap"] = ExDF["Name"].apply(lambda x: " ".join(reversed(x.split())))
SnDF["Swap"] = SnDF["Name"].apply(lambda x: " ".join(reversed(x.split())))

print(ExDF)
print(SnDF)

ExDF['Flag'] = ExDF.Name.isin(SnDF.Name)
SnDF['Flag'] = SnDF.Name.isin(ExDF.Name)

print(ExDF)
print(SnDF)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 两个 pyspark dataframe 列之间的字符串匹配 - String matching between two pyspark dataframe columns 在两个数据框之间搜索匹配的字符串,然后使用函数(Pandas)将匹配列的名称分配给另一个数据框 - Search for a matching string between two dataframes, and assign the matching column's name to the other dataframe with a function (Pandas) 模糊字符串匹配 Python - dataframe - Fuzzy String Matching Python - dataframe Pandas Dataframe 中的部分字符串匹配 - Partial String Matching in Pandas Dataframe Python 字符串匹配 Spark dataframe - Python string matching with Spark dataframe 正则表达式匹配字符串之间的字符串 - regex matching string in between string Python - 在两个 DataFrame 列之间查找所有匹配字符串 - 序列项 0:预期的 str 实例,找到的元组 - Python - Findall matching string(s) between two DataFrame columns - sequence item 0: expected str instance, tuple found Python - 在 DataFrame 列(废弃文本)和字符串列表之间查找匹配的字符串 - Python - Find matching string(s) between DataFrame column (scrapped text) and list of strings 熊猫数据框匹配行之间的日期 - Pandas Dataframe matching Dates between rows 两个pyspark Dataframe列之间的有效匹配 - Efficient matching between two pyspark Dataframe columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM