I have two dataframes- OK_df and Not_OK_df :
OK_df = pd.DataFrame({'type_id' : [1,2,3,3], 'count' : [2,7,2,5], 'unique_id' : ['1|2','2|7','3|2','3|5'], 'status' : ['OK','OK','OK','OK']})
Not_OK_df = pd.DataFrame({'type_id' : [1,3,5,6,3,3,3,1], 'count' : [1,1,1,1,3,4,6,3], 'col3' : [1,5,7,3,4,7,2,2], 'unique_id' : ['1|1','3|1','5|1','6|1','3|3','3|4','3|6','1|3'], 'status' : ['Not_OK','Not_OK','Not_OK','Not_OK','Not_OK','Not_OK','Not_OK','Not_OK']})
Ok_df:
type_id count unique_id status
0 1 2 1|2 OK
1 2 7 2|7 OK
2 3 2 3|2 OK
3 3 5 3|5 OK
Not_OK_df:
type_id count col3 unique_id status
0 1 1 1 1|1 Not_OK
1 3 1 5 3|1 Not_OK
2 5 1 7 5|1 Not_OK
3 6 1 3 6|1 Not_OK
4 3 3 4 3|3 Not_OK
5 3 4 7 3|4 Not_OK
6 3 6 2 3|6 Not_OK
7 1 3 2 1|3 Not_OK
where,
type_id : Non-unique id for corresponding type.
count : Number of counts from first time a type_id was seen.
unique_id : Combination of type_id and count : 'type_id|count'
col3 : Another column.
status : Has values - OK or Not_OK
For a row in Ok_df there is atleast one row in Not_OK_df with the same type_id with count value less than count value of OK_df row.
I want to find Not_OK_df rows that satisfy the above condition ie,
Not_OK_df['type_id'] == OK_df['type_id'] & Not_OK_df['count'] < OK_df['count']
Reindexing only valid with uniquely valued Index objects
The expected output is :
type_id count col3 unique_id status
0 1 1 1 1|1 Not_OK
1 3 1 5 3|1 Not_OK
2 3 3 4 3|3 Not_OK
3 3 4 7 3|4 Not_OK
Note : It doesn't contain rows with unique_id : ['3|6','1|3'] since there's no row in OK_df that has OK_df['count'] > not_OK_df['count']
.
How can I retrieve the required rows. Thanks in advance.
If I understand you correctly your selection criteria is as follows:
Not_ok_df
must have the same type_id
as a row in ok_df
count
smaller than the maximum count
from rows of the same type_id
in ok_df
First create a dictionary for the maximum value of count
for each unique type_id
.
max_counts =OK_df.groupby('type_id').max()['count'].to_dict()
Then check if every row in Not_ok_df
satisfies your criteria
Not_OK_df[
Not_OK_df.apply(
lambda not_ok_row: max_counts[not_ok_row['type_id']] > not_ok_row['count'] #returns True if there exists a larger count in ok_df with the same type_id
if not_ok_row['type_id'] in max_counts else False, #checks to see if your Not_ok_df row's type_id exists in ok_df
axis=1
)
]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.