Group by and filter based on a condition in pandas

Question

I want to drop a whole group if a condition on one column is satisfied, (don't pay attention to Column X1 and X2):

 Subject  Visit           X1      X2
   A       aaa          1647143  1672244
   A       creamy       1672244  1689707
   A       bbb          1689707  1713090
   B       yyy          1735352  1760283
   B       ice cream    1760283  1788062
   C       foo          1788062  1789885
   C       doo          1789885  1790728

exemple if "Visit" contains the string "cream" all Subject A and Subject B records will be deleted and result would be:

Subject  Visit      X1      X2

 C       foo    1788062  1789885
 C       doo    1789885  1790728

I tried: and it didn't delete the whole group records

df.groupby(by=['Subject']).apply(lambda d: d[~d['Visit'].str.contains('cream',flags=re.I, regex=True)])

Answer 1

Groupby then check if Visit column if each group contains cream string.

def move_group(group):
    if not any(group['Visit'].str.contains('cream')):
        return group

df_ = df.groupby('Subject').apply(move_group).dropna()

# print(df_)

  Subject Visit         X1         X2
5       C   foo  1788062.0  1789885.0
6       C   doo  1789885.0  1790728.0

Answer 2

Use transform to assign True/False to the elements of group depending on the condition if group contains 'cream' or 'not'. Then drop the rows with False value.

mask = (df1.groupby('Subject')['Visit']
        .transform(lambda d: np.any(
              d.str.contains('cream', flags = re.I, regex = True)))
        )
df = df[~mask]

Answer 3

You can use GroupBy.filter :

df.groupby("Subject").filter(lambda gr: ~gr.Visit.str.contains("cream").any())

to get

  Subject Visit       X1       X2
5       C   foo  1788062  1789885
6       C   doo  1789885  1790728

We filter on "keep the groups that do not ( ~ ) contain ( str.contains ) any ( any ) "cream" in the Visit column".

Answer 4

You can filter by first creating the column which checks for the presence of cream , then filter using transform , but on the sum of the booleans:

(df
.assign(cream = df.Visit.str.contains("cream"))
.loc[lambda df: df.groupby("Subject")
                  .cream
                  .transform("sum")==0, 
     df.columns]
)
Out[14]: 
  Subject Visit       X1       X2
5       C   foo  1788062  1789885
6       C   doo  1789885  1790728

Group by and filter based on a condition in pandas

Question

4 answers

solution1
0 2021-04-11 09:47:15

solution2
0 2021-04-11 09:50:37

solution3
0 2021-04-11 09:52:14

solution4
0 2021-04-12 02:32:40

Group by and filter based on a condition in pandas

Question

4 answers

solution1 0 2021-04-11 09:47:15

solution2 0 2021-04-11 09:50:37

solution3 0 2021-04-11 09:52:14

solution4 0 2021-04-12 02:32:40

solution1
0 2021-04-11 09:47:15

solution2
0 2021-04-11 09:50:37

solution3
0 2021-04-11 09:52:14

solution4
0 2021-04-12 02:32:40