简体   繁体   English

根据另一个数据框的多个功能对数据框进行过滤

[英]Filter on dataframe based on multiple features of another dataframe

The situation: 情况:
I have two datasets: 我有两个数据集:

  • df1: contains the data of the sensors, the machine ID logged in every minute df1:包含传感器的数据,每分钟登录的机器ID
  • df2: contains the production unit ID-s, the machine ID and the starting and ending datetime of the units df2:包含生产单位ID,机器ID以及单位的开始和结束日期时间

df1: DF1:
在此处输入图片说明

df2: DF2:
在此处输入图片说明

My task is to filter only on the production timeframes of the machines. 我的任务是仅按机器的生产时间范围进行过滤。 This means that based on the production datetimes (these are the timeframes between start and stop in df2) in df2 I need to filter out the releavant sensor data from df2 (sensor data is logged in df2 in every minute no matter if there is production or not). 这意味着基于df2中的生产日期时间(这些时间是df2中开始和停止之间的时间范围),我需要从df2中滤除泄漏传感器数据(传感器数据每分钟记录在df2中,无论是否有生产或不)。

The problem: 问题:
I was able to write a code which filters out the timeintervals in df2, but I am can't figure out how to filter on the machine ID as well. 我能够编写一个代码来过滤df2中的时间间隔,但是我也无法弄清楚如何对计算机ID进行过滤。
Here is my working code containing only the datetime filtering: 这是我的工作代码,仅包含日期时间过滤:

for index, row in df1.iterrows():
    mask = ((df2.index >= row['Start']) & (df2.index <= row['Stop']))
    df2.loc[mask, 'Sarzs_no'] = row['Sarzs_no']
    df2.loc[mask, 'Output'] = row['Output']

Here is my attempt to add the "Unit"(=machine ID) filtering as well to the datetime filtering: 这是我尝试将“ Unit”(=计算机ID)过滤以及日期时间过滤添加到其中:

for index, row in df1.iterrows():
    mask = ((df1.index >= row['Start']) & (df1.index <= row['Stop']) & (row['Unit']==df1.Unit))
    df1.loc[mask, 'Sarzs_no'] = row['Sarzs_no']
    df1.loc[mask, 'Output'] = row['Output']

The above code unfortunatelly is not working. 不幸的是,以上代码无法正常工作。

Questions: 问题:

  1. Could you please let me know what am I doing wrong? 您能告诉我我在做什么错吗?
  2. Could you please let me know how can I have a filter argument on the machine ID as well (column "Unit")? 您能否让我知道如何在机器ID(列“ Unit”)上添加过滤参数?

Thank you for your help in advance! 提前谢谢你的帮助!

I wanted to post this as a comment, but I don't have enough reputation to do this. 我想发表此评论,但我没有足够的声誉来做到这一点。 As initial hints: 初步提示:

1) Try checking your keys. 1)尝试检查您的钥匙。 Unit in your first df has a different pattern than in your second. 第一个df中的单位与第二个df中的单位具有不同的模式。 You may need to transform one or the other. 您可能需要转换一个或另一个。 eg before looping: 例如循环之前:

df1["Unit"] = df1["Unit"].apply(lambda x: x.split('_')[1]) # K2_110 -> 110

2) In your example you iterate through the first dataframe and apply the mask on the first dataframe as well 2)在您的例子中,你通过第一数据帧迭代和敷面膜上的第一个数据帧,以及

df1.loc[mask, 'Sarzs_no'] = row['Sarzs_no']
df1.loc[mask, 'Output'] = row['Output']`

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM