Find a row matching multiple column criteria

Question

I have a dataframe with 2M rows which is in the below format:

ID   Number
1    30
1    40
1    60
2    10
2    30
3    60

I need to select the IDs have the number 30 and 40 present (in this case, output should be 1).

I know we can create a new DF having only numbers 30 & 40 and then groupby to see which IDs have more than count 1. But is there a way we can to do both in the groupby statement?

My code:

a=df[(df['Number']==30) | (df['Number']==40) ]
b=a.groupby('ID')['Number'].nunique().to_frame(name='tt').reset_index()
b[b['tt'] > 1]

Answer 1

Use groupby filter and issubset

s = {30, 40}
df.groupby('ID').filter(lambda x: s.issubset(set(x.Number)))

Out[158]:
   ID  Number
0   1      30
1   1      40
2   1      60

Answer 2

I would create a df for each condition and then inner join them:

df1 = df[df.Number == 30][['Number']]
df2 = df[df.Number == 40][['Number']]

df3 = df1.join(df2,how='inner',on='Number')

Answer 3

I find the fact that the describe() method of Groupby objects returns a dataframe to be extremely helpful.

Output temp1 = a.groupby("ID").describe() and temp2 = a.groupby("ID").describe()["Number"] to a Jupyter notebook to see what they look like, then the following code (which follows on from yours) should make sense.

summary = a.groupby("ID").describe()["Number"]
summary.loc[summary["count"] > 1].index

Find a row matching multiple column criteria

Question

3 answers

solution1
2 ACCPTED 2019-10-30 16:21:05

solution2
1 2019-10-30 16:09:07

solution3
1 2019-10-30 16:11:51

Find a row matching multiple column criteria

Question

3 answers

solution1 2 ACCPTED 2019-10-30 16:21:05

solution2 1 2019-10-30 16:09:07

solution3 1 2019-10-30 16:11:51

solution1
2 ACCPTED 2019-10-30 16:21:05

solution2
1 2019-10-30 16:09:07

solution3
1 2019-10-30 16:11:51