简体   繁体   中英

Filter on dataframe based on multiple features of another dataframe


I have two datasets:

  • df1: contains the data of the sensors, the machine ID logged in every minute
  • df2: contains the production unit ID-s, the machine ID and the starting and ending datetime of the units

df1:
在此处输入图片说明

df2:
在此处输入图片说明

My task is to filter only on the production timeframes of the machines. This means that based on the production datetimes (these are the timeframes between start and stop in df2) in df2 I need to filter out the releavant sensor data from df2 (sensor data is logged in df2 in every minute no matter if there is production or not).


I was able to write a code which filters out the timeintervals in df2, but I am can't figure out how to filter on the machine ID as well.
Here is my working code containing only the datetime filtering:

for index, row in df1.iterrows():
    mask = ((df2.index >= row['Start']) & (df2.index <= row['Stop']))
    df2.loc[mask, 'Sarzs_no'] = row['Sarzs_no']
    df2.loc[mask, 'Output'] = row['Output']

Here is my attempt to add the "Unit"(=machine ID) filtering as well to the datetime filtering:

for index, row in df1.iterrows():
    mask = ((df1.index >= row['Start']) & (df1.index <= row['Stop']) & (row['Unit']==df1.Unit))
    df1.loc[mask, 'Sarzs_no'] = row['Sarzs_no']
    df1.loc[mask, 'Output'] = row['Output']

The above code unfortunatelly is not working.

  1. Could you please let me know what am I doing wrong?
  2. Could you please let me know how can I have a filter argument on the machine ID as well (column "Unit")?

Thank you for your help in advance!

I wanted to post this as a comment, but I don't have enough reputation to do this. As initial hints:

1) Try checking your keys. Unit in your first df has a different pattern than in your second. You may need to transform one or the other. eg before looping:

df1["Unit"] = df1["Unit"].apply(lambda x: x.split('_')[1]) # K2_110 -> 110

2) In your example you iterate through the first dataframe and apply the mask on the first dataframe as well

df1.loc[mask, 'Sarzs_no'] = row['Sarzs_no']
df1.loc[mask, 'Output'] = row['Output']`

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM