df = pd.DataFrame({
"Continent": list("AAABBBCCD"),
"Country": list("FGHIJKLMN"),
"Population": [90, 140, 50, 80, 80, 70, 50, 125, 50]})
As explained, I want to return all of the rows, where all countries in each continent are less than 100.
Continent Country Population
0 A F 90
1 A G 140
2 A H 50
3 B I 80
4 B J 80
5 B K 70
6 C L 50
7 C M 125
8 D N 50
Every row in Continent A is removed because Country G has a population greater than 100. Every row in Continent C is removed because of Country M. I want the returned DataFrame to look like below:
Continent Country Population
3 B I 80
4 B J 80
5 B K 70
8 D N 50
I tried df[df["Population"] <= 100]
but couldn't determine how to adjust for Continent.
here is one way to do it
# groupby on continent
# using makes the row True/False, whether max for the group is below 100
out=df[df.groupby(['Continent'])['Population'].transform(lambda x: x.max()<100)]
out
Continent Country Population
3 B I 80
4 B J 80
5 B K 70
8 D N 50
Here is another way to accomplish it
import pandas as pd
df = pd.DataFrame({
"Continent": list("AAABBBCCD"),
"Country": list("FGHIJKLMN"),
"Population": [90, 140, 50, 80, 80, 70, 50, 125, 50]})
df.loc[df.groupby(['Continent'])['Population'].transform('max') <= 100]
I usually don't like using lambda since it is so slow, but the above answer also works. This is just another option
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.