使用 Python 中的条件对多个 dataframe 列进行分组和比较

Question

I'm tryng to print out the states with highest population in each region.我正在尝试打印出每个地区人口最多的州。

Code Sample :代码示例：

# all unique regions
region_unique = data['Region'].unique()

# highest population
max_pop = data['population'].max()

How can I chain the above lines of code and bring in the 'States' column to achieve my result?如何链接上述代码行并引入'States'列以实现我的结果？

Dataset :数据集：

Answer 1

Considering you haven't mentioned any library...考虑到你没有提到任何图书馆......

You could first create a helper dict , mapping each region to an array of states.您可以首先创建一个辅助dict ，将每个区域映射到一个状态数组。 Each state is a tuple: (state, pop) (name and population count):每个 state 是一个元组：（ (state, pop) （名称和人口计数）：

regions = {}
for state, pop, region in zip(data['States'], data['population'], data['Region']):
    res.setdefault(region, []).append((state, pop))

Then for each region you can pull out the most inhabited state:然后对于每个区域，您可以提取出最多人居住的 state：

for region, states in regions.items():
    print(region, max(states, key=lambda _, pop: pop))

To states under each region with a population less than 100 , you can do:对于每个地区下人口少于100的州，您可以执行以下操作：

for region, states in regions.items():
    print(region, list(filter(lambda state: state[1] > 100, states)))

使用 Python 中的条件对多个 dataframe 列进行分组和比较

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-11-28 13:15:49

使用 Python 中的条件对多个 dataframe 列进行分组和比较

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-11-28 13:15:49

解决方案1
0 已采纳 2020-11-28 13:15:49