简体   繁体   English

如何在 pandas 中依次检查数据帧的某些行是否在多个数据帧中匹配

[英]How can I check if some rows of a data frame has matches in multiple data frames, sequentially in pandas

I have one dataframe with below data我有一个 dataframe 有以下数据

id ID print_volume打印体积
A一个 100 100
b b 200 200
c c 250 250

Assume the above table represents books in a library.假设上表代表图书馆中的书籍。 We are going to check if a book is present with any of the 3 readers it has, sequentially.我们将依次检查一本书是否与它拥有的 3 个读者中的任何一个一起出现。 Please note that the column names are all different in these cases.请注意,在这些情况下,列名都是不同的。

reader 1读者 1

name姓名 volume体积
c c 100 100
A一个 120 120
c c 250 250

reader 2读者 2

book vers版本
A一个 100 100
b b 200 200
c c 250 250

reader 3读者 3

book_name书名 print打印
p p 100 100
b b 200 200
n n 250 250

Expected output预期 output

id ID print_volume打印体积 present当下
A一个 100 100 2 2
b b 200 200 3 3
c c 250 250 1 1

Here, even though reader 1 and reader 2 has book c with the same volume, we are marking 1 in the present column because we check reader 1, 2 and 3 sequentially.在这里,即使读者 1 和读者 2 拥有相同卷的图书 c,我们在当前列中标记 1,因为我们依次检查读者 1、2 和 3。 If something is already found, then we don't look further.如果已经找到了一些东西,那么我们就不要再往前看了。

This is what I am doing now:这就是我现在正在做的事情:

def check_for_book(library_df,reader_df,reader_id):
        subset_to_check=library_df[library_df['present']=='not_found']
        subset_to_check=pd.merge(subset_to_check,reader_df,on=<columns>,how='left',indicator='found')
        subset_to_check['present']=np.where(subset_to_check['found']=='both',reader_id, 'not_found')
        return(pd.concat([subset_to_check,library_df[library_df['present']!='not_found']))

library_df['present']='not_found'
library_df=check_for_book(library_df,reader_df1,'1')
library_df=check_for_book(library_df,reader_df2,'2')
library_df=check_for_book(library_df,reader_df2,'2')

I am not able to find out the bug, the results which I get are inconsistent.我无法找出错误,我得到的结果不一致。 Is there a better way to join these data frames in a better way?有没有更好的方法以更好的方式连接这些数据框?

Thanks谢谢

If you wanna check sequentially row by row, then you can use:如果你想逐行顺序检查,那么你可以使用:

result = []
for n in test.values:
    for i,j,k  in zip(df1.values, df2.values, df3.values):
        if (n == i).all():
            result.append([n[0],n[1],1])
            break
        elif (n == j).all():
            result.append([n[0],n[1],2])
            break
        elif (n == k).all():
            result.append([n[0],n[1],3])
            break
        
final_df = pd.DataFrame(result)

NOTE : Assuming the 1st df name is test and rest 3 - df1 , df2 , df3 . NOTE :假设第一个 df 名称是test和 rest 3 - df1 , df2 , df3

Let's try something like:让我们尝试一下:

import pandas as pd

df = pd.DataFrame({
    'id': {0: 'A', 1: 'b', 2: 'c'}, 'print_volume': {0: 100, 1: 200, 2: 250}
})

reader1 = pd.DataFrame({
    'name': {0: 'c', 1: 'A', 2: 'c'}, 'volume': {0: 100, 1: 120, 2: 250}
})
reader2 = pd.DataFrame({
    'book': {0: 'A', 1: 'b', 2: 'c'}, 'vers': {0: 100, 1: 220, 2: 250}
})
reader3 = pd.DataFrame({
    'book_name': {0: 'p', 1: 'b', 2: 'n'}, 'print': {0: 100, 1: 200, 2: 250}
})

readers = []
# Rename Columns so they are uniform with df
# Add indicator to each readers
for i, r_df in enumerate((reader1, reader2, reader3)):
    r_df.columns = df.columns
    r_df['present'] = i + 1
    readers.append(r_df)

# Create Readers
readers = pd.concat(readers, axis=0).drop_duplicates(df.columns, keep='first')
# Merge DF and Readers Together
new_df = df.merge(readers, on=df.columns.tolist(), how='left')
print(new_df)

new_df : new_df

  id  print_volume  present
0  A           100        2
1  b           200        3
2  c           250        1

Add an Indicator to every Reader so that the DataFrame is identifiable then concat together and drop duplicates so only the first dataframe is kept:向每个阅读器添加一个指示符,以便可以识别 DataFrame,然后连接在一起并删除重复项,因此只保留第一个 dataframe:

readers : readers

  id  print_volume  present
0  c           100        1
1  A           120        1
2  c           250        1
0  A           100        2
1  b           220        2
0  p           100        3
1  b           200        3
2  n           250        3

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 Pandas 中的两个数据帧与多个匹配项匹配? - How Do I match two Data Frames in Pandas with multiple matches? Python & Pandas:如何在 (for) 循环中从我的大数据框创建新的小数据框? - Python & Pandas: How can I create new smaller data frames from my large data frame in a (for) loop? 我如何知道熊猫数据框中的重复行? - How can I know which are the duplicated rows in a Pandas Data Frame? 如何找到与已知数据子集匹配的 pandas 数据帧的 boolean 索引? - How can I find a boolean index of a pandas data frame that matches a known subset of the data? 我可以在某些列值上使用 pandas 数据框读取一系列行吗? - Can I read a range of rows using pandas data frame on some column value? 如何在Python的另一个数据框中检查熊猫数据框的ID? - How can I check the ID of a pandas data frame in another data frame in Python? 如何在 pandas 中按顺序填充缺失的数据? - How can I fill missing data sequentially in pandas? 如何拆分 pandas 一些不是列表的数据框行? - How to split pandas some of data frame rows that are not lists? 在 pandas 数据框中如何删除一些汇总的重复行 - In pandas data frame how to remove some summarized duplicates rows 在熊猫中,如何检查两个字符串是否与现有数据框中的任何行中的多个列匹配并将其删除 - In pandas, how to check if two strings match multiple columns in any of the rows in existing data frame and delete it
相关标签
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM