简体   繁体   中英

Filter a row based on a certain condition from a grouped data - python

I have a data in which data are grouped together, but in my final output I need to output only that grouped data which satisfy the condition of containing both F and P values within a grouped. Grouped contain only either F or P will be discarded. Below table only those b_name will be selected which contains both F and P. From table XXXX, ZZZZ, BBBB will be selected and others not.

Input

在此处输入图像描述

Output

在此处输入图像描述

You could group by the column b_name and then use filter to keep only those groups that, simultaneously, have the F and the P values in the p_f column (for each group). Next, remove the duplicated rows with drop_duplicates("b_name") and set p_f to the desired output.

import pandas as pd

df = pd.read_csv("sample.csv", sep=";")
print(df)

df_group = df.groupby("b_name")
df_filter = df_group.filter(lambda x:
        ("F" in x.p_f.values) and ("P" in x.p_f.values)
      )
df_filter = df_filter.drop_duplicates("b_name")

df_filter["p_f"] = "FP"
print(df_filter[["b_id", "b_name", "p_f"]])

Output from df_filter

    b_id b_name p_f
0  29743   XXXX  FP
3  29751   ZZZZ  FP
6  30832   BBBB  FP

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM