简体   繁体   English

如何对两个不同的 pandas 列中的字符串进行部分匹配检查?

[英]How do to a partial match check across strings in two different pandas columns?

I have two dataframes, one with 300 names and one with 2000. I want to check if all of the words in each of the 300 names are contained in the 2000 in any iteration.我有两个数据框,一个有 300 个名字,一个有 2000 个名字。我想检查 300 个名字中每个名字中的所有单词是否都包含在任何迭代中的 2000 个中。 For example:例如:

Name 1: Mark, Alex, Smith,姓名 1:马克、亚历克斯、史密斯、

Name 2: Mark, Joseph, Smith, Alex, the, first姓名 2:Mark, Joseph, Smith, Alex, the, first

Dataframe 1 Dataframe 1

Name 1姓名 1
'Mark', 'Alex', 'Smith' “马克”、“亚历克斯”、“史密斯”

Dataframe 2 Dataframe 2

Name 2姓名 2
'Mark', 'Joseph', 'Alex', 'Smith', 'the', First' 'Mark', 'Joseph', 'Alex', 'Smith', 'the', First'

As you can see, the column in dataframe 2 contains all of the words from column in dataframe 1, but additional words in the name too.如您所见,dataframe 2 中的列包含 dataframe 1 中列中的所有单词,但名称中也包含其他单词。

My query should match here, because Name 2 contains all of the words from name 1 even though it is not an exact match.我的查询应该在这里匹配,因为名称 2 包含名称 1 中的所有单词,即使它不是完全匹配。 Each of the names is split into individual words in each cell.每个名称在每个单元格中都被拆分成单独的单词。

Ideally, I would run a function across dataframe 2 which contains 2,000 names and see if any of those names have contain all of the words from dataframe 1.理想情况下,我会在 dataframe 2 中运行 function,其中包含 2,000 个名称,并查看这些名称中是否有任何名称包含 dataframe 1 中的所有单词。

Edit: Someone kindly pointed out in the comments that what I am trying to say, is can I find if Name 1 is a subset of Name 2.编辑:有人在评论中友善地指出,我想说的是,我可以找到 Name 1 是否是 Name 2 的子集。

Assuming that each of your dataframes have a column with list of strings:假设您的每个数据框都有一个包含字符串列表的列:

>>> df1 = pd.DataFrame({
        "Name 1": [['Mark', 'Alex', 'Smith'], ['S1', 'S2', 'S3']],
    })
>>> df2 = pd.DataFrame({
        "Name 2": [['Mark', 'Joseph', 'Alex', 'Smith', 'the', 'First'], ['S3', 'S4', 'S5']],
    })

You could merge both dataframes first:您可以先合并两个数据框:

>>> df = pd.merge(df1, df2, left_index=True, right_index=True)
>>> print(df)
                Name 1                                   Name 2
0  [Mark, Alex, Smith]  [Mark, Joseph, Alex, Smith, the, First]
1         [S1, S2, S3]                             [S3, S4, S5]

And now apply this function to identify if the subset condition is True :现在应用此 function 来确定子集条件是否为True

>>> df = df.apply(lambda row: set(row["Name 1"]).issubset(set(row["Name 2"])), axis=1)
>>> df
0     True
1    False
dtype: bool

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在pandas数据框中,如何检查同一行但不同列中是否存在两个字符串? - In a pandas dataframe, how do I check if two strings exist on same row but in different columns? 不同 pandas 数据帧的两列之间的部分字匹配 - Partial word match between two columns of different pandas dataframes python如何在两个不相等大小的列之间匹配部分字符串 - python how to match partial strings between two unequal sized columns 在熊猫中,如何检查两个字符串是否与现有数据框中的任何行中的多个列匹配并将其删除 - In pandas, how to check if two strings match multiple columns in any of the rows in existing data frame and delete it 跨两个数据帧匹配部分字符串并合并 - Match partial strings across two data frames and merge 如何检查不同字符串中相同数字索引处的两个元素是否匹配? - How to check if two elements at the same number index in different strings match? 如何匹配 pandas 中不同值的字符串? - How do you match strings with different values in pandas? 如何使用 Pandas 将两个字符串拆分为 Python 中的不同列? - How to split two strings into different columns in Python with Pandas? 如何使用熊猫检查日期列中的日期是否在不同列中的两个日期之间? - How do I check if a date in a date column is between two dates in different columns using pandas? Python:检查两个列表之间字符串的部分匹配 - Python: Check for partial match of strings between two lists
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM