简体   繁体   中英

Group by and filter based on a condition in pandas

I want to drop a whole group if a condition on one column is satisfied, (don't pay attention to Column X1 and X2):

 Subject  Visit           X1      X2
   A       aaa          1647143  1672244
   A       creamy       1672244  1689707
   A       bbb          1689707  1713090
   B       yyy          1735352  1760283
   B       ice cream    1760283  1788062
   C       foo          1788062  1789885
   C       doo          1789885  1790728

exemple if "Visit" contains the string "cream" all Subject A and Subject B records will be deleted and result would be:

Subject  Visit      X1      X2

 C       foo    1788062  1789885
 C       doo    1789885  1790728

I tried: and it didn't delete the whole group records

df.groupby(by=['Subject']).apply(lambda d: d[~d['Visit'].str.contains('cream',flags=re.I, regex=True)])

Groupby then check if Visit column if each group contains cream string.

def move_group(group):
    if not any(group['Visit'].str.contains('cream')):
        return group

df_ = df.groupby('Subject').apply(move_group).dropna()
# print(df_)

  Subject Visit         X1         X2
5       C   foo  1788062.0  1789885.0
6       C   doo  1789885.0  1790728.0

Use transform to assign True/False to the elements of group depending on the condition if group contains 'cream' or 'not'. Then drop the rows with False value.

mask = (df1.groupby('Subject')['Visit']
        .transform(lambda d: np.any(
              d.str.contains('cream', flags = re.I, regex = True)))
        )
df = df[~mask]

You can use GroupBy.filter :

df.groupby("Subject").filter(lambda gr: ~gr.Visit.str.contains("cream").any())

to get

  Subject Visit       X1       X2
5       C   foo  1788062  1789885
6       C   doo  1789885  1790728

We filter on "keep the groups that do not ( ~ ) contain ( str.contains ) any ( any ) "cream" in the Visit column".

You can filter by first creating the column which checks for the presence of cream , then filter using transform , but on the sum of the booleans:

(df
.assign(cream = df.Visit.str.contains("cream"))
.loc[lambda df: df.groupby("Subject")
                  .cream
                  .transform("sum")==0, 
     df.columns]
)
Out[14]: 
  Subject Visit       X1       X2
5       C   foo  1788062  1789885
6       C   doo  1789885  1790728

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM