简体   繁体   English

配Seaborn

[英]Distribution with Seaborn

I have a pandas dataframe that is composed like this:我有一个 pandas dataframe 组成如下:

User_id用户身份 Calls来电 Index指数
7A 7A 8 8 19-05-2020 19-05-2020
10B 10B 5 5 19-05-2020 19-05-2020
7A 7A 2 2 20-05-2020 20-05-2020
10B 10B 6 6 20-05-2020 20-05-2020

With in index the dates and times at which they make their calls even if here I put it on the right under pandas say that the dates are the index on the left.在索引中,他们拨打电话的日期和时间即使在这里我把它放在 pandas 下的右侧,说日期是左侧的索引。

I would like to make the distribution of the users of their number of calls that they make in the year.我想对用户在一年中拨打的电话数量进行分布。

Currently there are more than 400 different users so I will not do all of them only those who have the largest number of calls that I find easily with this: ret.groupby(['user_id'])['calls'].sum().sort_values(ascending=False).head(10)目前有 400 多个不同的用户,所以我不会只对那些我很容易找到的呼叫数量最多的用户进行所有操作: ret.groupby(['user_id'])['calls'].sum().sort_values(ascending=False).head(10)

I try to make a loop that takes the first 10 users who have the most calls and makes their distribution of calls in the year (so we have on the x-axis the months of the year written in number or not, on the y-axis the density or the number of calls, and we would have a title to say of which users we made the graph)我尝试创建一个循环,将呼叫次数最多的前 10 个用户分配给一年中的呼叫分布(因此我们在 x 轴上将一年中的月份以数字或不写成数字,在 y-轴密度或呼叫次数,我们将有一个标题来说明我们制作图表的用户)

How can I do it?我该怎么做?

try something like this, first you sort by your column 'calls' and after that you keep the first 10 rows尝试这样的事情,首先您按“呼叫”列排序,然后保留前 10 行

ret = ret.groupby(['user_id'])['calls'].sum().sort_values(ascending=False, by=['calls'])
ret = ret.iloc[:10]

IIUC try resample sum to get total calls per month then use plot : IIUC 尝试resample sum以获得每月的总通话次数,然后使用plot

*Note the provided data contains only a single month, so I added a second month to make the graph more clear: *请注意,提供的数据仅包含一个月,因此我添加了第二个月以使图表更清晰:

import pandas as pd
from matplotlib import pyplot as plt

df = pd.DataFrame({'User_id': ['7A', '10B', '7A', '10B'], 'Calls': [8, 5, 2, 6],
                   'Index': ['19-05-2020', '19-05-2020', '20-06-2020',
                             '20-06-2020']}).set_index('Index')

df.index = pd.to_datetime(df.index, format='%d-%m-%Y')

users = ','.join(df['User_id'].unique())

plot_df = df.resample('1M')['Calls'].sum()
ax = plot_df.plot(kind='bar',
                  rot=0,
                  title=users,
                  ylabel='Calls',
                  xlabel='Months')

ax.set_xticklabels(plot_df.index.strftime("%Y-%m"))
plt.show()

图形


df : df

           User_id  Calls
Index                    
2020-05-19      7A      8
2020-05-19     10B      5
2020-06-20      7A      2
2020-06-20     10B      6

users : users

7A,10B

plot_df : plot_df

Index
2020-05-31    13
2020-06-30     8
Freq: M, Name: Calls, dtype: int64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM