简体   繁体   English

检查一个字符串是否包含来自不同数据帧的另一个字符串-Python

[英]Check if a string contains another string from different Dataframe - Python

I have two Dataframes with different columns and size. 我有两个具有不同列和大小的数据框。

The first one has some columns and one of them is a string field (column 1). 第一个有一些列,其中一个是字符串字段(列1)。 The second dataframe has 2 columns, one is a string field (column 4) with 2 words separated by comma and another is a integer field (column 5). 第二个数据帧有2列,一列是字符串字段(第4列),其中两个单词之间用逗号分隔,另一列是整数字段(第5列)。

I need to verify that column 1 in Dataframe 1 has the words in column 4 Dataframe 2 and fill in Dataframe 1 with the corresponding information in dataframe 2. 我需要验证数据框1中的第1列在第4列数据框2中有单词,并在数据框1中用数据框2中的相应信息填充。

Example: 例:

df1
    column 1                                column 2          column 3
0   bla bla sample1 bla bla sample2         a                 f
1   bla bla sample1 bla bla sample5         b                 g
2   bla bla sample3 bla bla sample4         c                 h
3   bla bla sample8 bla bla sample7         d                 i
4   bla bla sample1 bla bla sample2         e                 j

 df2
    column 4                       column 5
0   ('sample1', 'sample2'),        50
1   ('sample3', 'sample4'),        35 
2   ('sample1', 'sample5')         18

I need the output: 我需要输出:

Output:
df1
    column 1                               column 2  column 3  column 4                     column 5
0   bla bla sample1 bla bla sample2        a         f         ('sample1', 'sample2')     50
1   bla bla sample1 bla bla sample5        b         g         ('sample1', 'sample5')     18
2   bla bla sample3 bla bla sample4        c         h         ('sample3', 'sample4')     35
3   bla bla sample8 bla bla sample7        d         i         NaN        
4   bla bla sample1 bla bla sample2        e         j         ('sample1', 'sample2')     50

Any ideas? 有任何想法吗?

Thanks! 谢谢!

I don't guarantee this will be particularly fast, but it gets the job done. 我不保证这会特别快,但是可以完成工作。 We'll use set logic to check for matches. 我们将使用set逻辑检查匹配项。 We have to jump through some hoops so that we can store a list of tuples of the matches. 我们必须跳过一些箍,以便我们可以存储比赛的元组列表。 I don't think this is a particularly good idea. 我认为这不是一个特别好的主意。

import numpy as np
import pandas as pd

df1['setc'] = df1['column 1'].str.split().apply(set)
# Initialize so addition works
df1['column 4'] = [[] for i in range(len(df1))]
df1['column 5'] = 0

for idx, row in df2.iterrows():
    m = (df1.setc.values & set(row['column 4'])) == set(row['column 4'])
    df1.loc[m, 'column 4'] += pd.Series([[row['column 4']] for x in range(len(m))])[m]
    df1.loc[m, 'column 5'] += row['column 5']

df1 = df1.drop(columns='setc')
# NaN where nothing matched
df1.loc[df1['column 4'].str.len().eq(0), ['column 4', 'column 5']] = np.NaN

Output: 输出:

                          column 1 column 2 column 3                                  column 4  column 5
0  bla bla sample1 sample5 sample2        a        f  [(sample1, sample2), (sample1, sample5)]      68.0
1  bla bla sample1 bla bla sample5        b        g                      [(sample1, sample5)]      18.0
2  bla bla sample3 bla bla sample4        c        h                      [(sample3, sample4)]      35.0
3  bla bla sample8 bla bla sample7        d        i                                       NaN       NaN
4  bla bla sample1 bla bla sample2        e        j                      [(sample1, sample2)]      50.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何检查 dataframe 列是否包含来自另一个 dataframe 列的字符串并返回 python Z3A0524F883225EFFA94 中的相邻单元格 - How do I check if dataframe column contains a string from another dataframe column and return adjacent cell in python pandas? 如何检查数据帧是否包含python中的字符串? - How to check if a dataframe contains a string in python? 检查 Python dataframe 是否在列表中包含字符串 - Check if a Python dataframe contains string in list PYTHON-检查单元格是否包含另一个单元格的字符串 - PYTHON - Check if cell contains a string from another cell 检查一个字符串是否在python中包含另一个子字符串 - check if one string contains another substring in python 检查一个字符串是否包含另一个 - Check if a string contains another 熊猫-检查一个数据帧中的字符串列是否包含来自另一个数据帧的一对字符串 - Pandas - check if a string column in one dataframe contains a pair of strings from another dataframe 检查字符串是否包含pandas dataframe中同一列的子字符串 - check if string contains sub string from the same column in pandas dataframe pandas dataframe 检查列是否包含存在于另一列中的字符串 - pandas dataframe check if column contains string that exists in another column [Python]检查列表中的任何字符串是否包含另一个列表中的任何字符串 - [Python]Check if any string in a list is contains any string in another list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM