I have a dataframe like this
import pandas as pd
data = {'Index Title' : ["Company1", "Company1", "Company2", "Company3"],
'BusinessType' : ['Type 1', 'Type 2', 'Type 1', 'Type 2'],
'ID1' : ['123', '456', '789', '012']
}
df = pd.DataFrame(data)
df.index = df["Index Title"]
del df["Index Title"]
print(df)
where Index Title is a company name. For Company 1 I have two types - Type 1 and Type 2.
I would like to drop those rows where there is only one type - Type 1 or Type 2.
So in this case it should drop Company 2 and Company 3.
Could you please help me what is the best way to do that?
For such problems we usually consider groupby
and transform
based filtering as it is pretty fast.
df[df.groupby(level=0)['BusinessType'].transform('nunique') > 1]
BusinessType ID1
Index Title
Company1 Type 1 123
Company1 Type 2 456
The first step is to determine the groups/rows which are associated with more than one type:
df.groupby(level=0)['BusinessType'].transform('nunique')
Index Title
Company1 2
Company1 2
Company2 1
Company3 1
Name: BusinessType, dtype: int64
From here, we remove all companies whose # unique types associated with are == 1.
This is one way: - you group by Index Title
- filter if there is at least one Type 1
& one Type 2
df = (
df.groupby('Index Title')
.filter(lambda x: (x['BusinessType']=='Type 1').any() &
(x['BusinessType']=='Type 2').any())
.reset_index()
)
Update if you are looking for two or more types regardless if they are Type 1 & Type 2
df = (
df.groupby('Index Title')
.filter(lambda x: x['BusinessType'].nunique() > 1)
.reset_index()
)
In this case @cs95
's answer is the cleaner one, which you should use.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.