Filter a row based on a certain condition from a grouped data - python

Question

I have a data in which data are grouped together, but in my final output I need to output only that grouped data which satisfy the condition of containing both F and P values within a grouped. Grouped contain only either F or P will be discarded. Below table only those b_name will be selected which contains both F and P. From table XXXX, ZZZZ, BBBB will be selected and others not.

Input

Output

Answer 1

You could group by the column b_name and then use filter to keep only those groups that, simultaneously, have the F and the P values in the p_f column (for each group). Next, remove the duplicated rows with drop_duplicates("b_name") and set p_f to the desired output.

import pandas as pd

df = pd.read_csv("sample.csv", sep=";")
print(df)

df_group = df.groupby("b_name")
df_filter = df_group.filter(lambda x:
        ("F" in x.p_f.values) and ("P" in x.p_f.values)
      )
df_filter = df_filter.drop_duplicates("b_name")

df_filter["p_f"] = "FP"
print(df_filter[["b_id", "b_name", "p_f"]])

Output from df_filter

    b_id b_name p_f
0  29743   XXXX  FP
3  29751   ZZZZ  FP
6  30832   BBBB  FP

Filter a row based on a certain condition from a grouped data - python

Question

1 answers

solution1
1 ACCPTED 2021-03-02 00:38:21

Filter a row based on a certain condition from a grouped data - python

Question

1 answers

solution1 1 ACCPTED 2021-03-02 00:38:21

solution1
1 ACCPTED 2021-03-02 00:38:21