Group and Compare multiple dataframe columns with conditions in Python

Question

I'm tryng to print out the states with highest population in each region.

Code Sample :

# all unique regions
region_unique = data['Region'].unique()

# highest population
max_pop = data['population'].max()

How can I chain the above lines of code and bring in the 'States' column to achieve my result?

Dataset :

Answer 1

Considering you haven't mentioned any library...

You could first create a helper dict , mapping each region to an array of states. Each state is a tuple: (state, pop) (name and population count):

regions = {}
for state, pop, region in zip(data['States'], data['population'], data['Region']):
    res.setdefault(region, []).append((state, pop))

Then for each region you can pull out the most inhabited state:

for region, states in regions.items():
    print(region, max(states, key=lambda _, pop: pop))

To states under each region with a population less than 100 , you can do:

for region, states in regions.items():
    print(region, list(filter(lambda state: state[1] > 100, states)))

Group and Compare multiple dataframe columns with conditions in Python

Question

1 answers

solution1
0 ACCPTED 2020-11-28 13:15:49

Group and Compare multiple dataframe columns with conditions in Python

Question

1 answers

solution1 0 ACCPTED 2020-11-28 13:15:49

solution1
0 ACCPTED 2020-11-28 13:15:49