简体   繁体   中英

Only show specific groups in a df pandas

Hel lo, I need to focus on specific group within a table.

Here is an exemple:

groups col1 
A 3
A 4
A 2
A 1
B 3
B 3
B 4
C 2
D 4
D 3

and I would like to only show groups that contain 3 and 4 but no other number. Here I should get :

groups col1 
B 3
B 3
B 4
D 4
D 3

Here are possible 2 approaches - test values by Series.isin for membership and then get all groups with all True s by GroupBy.transform and GroupBy.all , last filter by boolean indexing :

df1 = df[df['col1'].isin([3,4]).groupby(df['groups']).transform('all')]
print (df1)
  groups  col1
4      B     3
5      B     3
6      B     4
8      D     4
9      D     3

Another approach is first get all groups values, which NOT contains values 3,4 and pass to another isin function with inverted mask:

df1 = df[~df['groups'].isin(df.loc[~df['col1'].isin([3,4]), 'groups'])]
print (df1)
  groups  col1
4      B     3
5      B     3
6      B     4
8      D     4
9      D     3

We can also use GroupBy.filter :

new_df=df.groupby('groups').filter(lambda x: x.col1.isin([3,4]).all() )
print(new_df)

  groups  col1
4      B     3
5      B     3
6      B     4
8      D     4
9      D     3

an alternative to remove Series.isin from the lambda function:

df['aux']=df['col1'].isin([3,4])
df.groupby('groups').filter(lambda x: x.aux.all()).drop('aux',axis=1)

Using df.loc[] and then searching by normal logic should work.

import pandas as pd

data = [['A', 3],
        ['A', 4],
        ['A', 2],
        ['A', 1],
        ['B', 3],
        ['B', 3],
        ['B', 4],
        ['C', 2],
        ['D', 4],
        ['D', 3]]
df = pd.DataFrame(data, columns=["col1", "col2"])

df = df.loc[df["col2"] >= 3]
print(df.head())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM