pandas 分组和可视化

Question

I have to do some analysis using Python3 and pandas with a dataset which is shown as a toy example-我必须使用 Python3 和 pandas 以及显示为玩具示例的数据集进行一些分析-

data
'''
    location importance    agent  count
0     London        Low  chatbot      2
1        NYC     Medium  chatbot      1
2     London       High    human      3
3     London        Low    human      4
4        NYC       High    human      1
5        NYC     Medium  chatbot      2
6  Melbourne        Low  chatbot      3
7  Melbourne        Low    human      4
8  Melbourne       High    human      5
9        NYC       High  chatbot      5
'''

My aim is to group the location and then count the number of Low, Medium and/or High 'importance' column for each location.我的目标是将位置分组，然后计算每个位置的低、中和/或高“重要性”列的数量。 So far, the code I have come up with is-到目前为止，我想出的代码是-

data.groupby(['location', 'importance']).aggregate(np.size)
'''
                      agent  count
location  importance              
London    High            1      1
          Low             2      2
Melbourne High            1      1
          Low             2      2
NYC       High            2      2
          Medium          2      2
'''

This grouping and count aggregation contains index as the grouping objects-此分组和计数聚合包含索引作为分组对象-

data.groupby(['location', 'importance']).aggregate(np.size).index

I don't know how to proceed next?不知道下一步怎么走？ Also, how can I visualize this?另外，我怎样才能想象这个？

Help?帮助？

Answer 1

I think you need DataFrame.pivot_table , added aggfunc=sum for aggregate if duplicates and then use DataFrame.plot :我认为您需要DataFrame.pivot_table ，如果重复，则为聚合添加aggfunc=sum ，然后使用DataFrame.plot ：

df = data.pivot_table(index='location', columns='importance', values='count', aggfunc='sum')

df.plot()

If need counts of pairs location with importance use crosstab :如果需要具有importance的对location的计数，请使用crosstab ：

df = pd.crosstab(data['location'], data['importance'])

df.plot()

pandas 分组和可视化

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-02-04 09:26:10

pandas 分组和可视化

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-02-04 09:26:10

解决方案1
2 已采纳 2021-02-04 09:26:10