[英]Distribution with Seaborn
I have a pandas dataframe that is composed like this:我有一个 pandas dataframe 组成如下:
User_id用户身份 | Calls来电 | Index指数 |
---|---|---|
7A 7A | 8 8 | 19-05-2020 19-05-2020 |
10B 10B | 5 5 | 19-05-2020 19-05-2020 |
7A 7A | 2 2 | 20-05-2020 20-05-2020 |
10B 10B | 6 6 | 20-05-2020 20-05-2020 |
With in index the dates and times at which they make their calls even if here I put it on the right under pandas say that the dates are the index on the left.在索引中,他们拨打电话的日期和时间即使在这里我把它放在 pandas 下的右侧,说日期是左侧的索引。
I would like to make the distribution of the users of their number of calls that they make in the year.我想对用户在一年中拨打的电话数量进行分布。
Currently there are more than 400 different users so I will not do all of them only those who have the largest number of calls that I find easily with this: ret.groupby(['user_id'])['calls'].sum().sort_values(ascending=False).head(10)
目前有 400 多个不同的用户,所以我不会只对那些我很容易找到的呼叫数量最多的用户进行所有操作: ret.groupby(['user_id'])['calls'].sum().sort_values(ascending=False).head(10)
I try to make a loop that takes the first 10 users who have the most calls and makes their distribution of calls in the year (so we have on the x-axis the months of the year written in number or not, on the y-axis the density or the number of calls, and we would have a title to say of which users we made the graph)我尝试创建一个循环,将呼叫次数最多的前 10 个用户分配给一年中的呼叫分布(因此我们在 x 轴上将一年中的月份以数字或不写成数字,在 y-轴密度或呼叫次数,我们将有一个标题来说明我们制作图表的用户)
How can I do it?我该怎么做?
try something like this, first you sort by your column 'calls' and after that you keep the first 10 rows尝试这样的事情,首先您按“呼叫”列排序,然后保留前 10 行
ret = ret.groupby(['user_id'])['calls'].sum().sort_values(ascending=False, by=['calls'])
ret = ret.iloc[:10]
IIUC try resample sum
to get total calls per month then use plot
: IIUC 尝试resample sum
以获得每月的总通话次数,然后使用plot
:
*Note the provided data contains only a single month, so I added a second month to make the graph more clear: *请注意,提供的数据仅包含一个月,因此我添加了第二个月以使图表更清晰:
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'User_id': ['7A', '10B', '7A', '10B'], 'Calls': [8, 5, 2, 6],
'Index': ['19-05-2020', '19-05-2020', '20-06-2020',
'20-06-2020']}).set_index('Index')
df.index = pd.to_datetime(df.index, format='%d-%m-%Y')
users = ','.join(df['User_id'].unique())
plot_df = df.resample('1M')['Calls'].sum()
ax = plot_df.plot(kind='bar',
rot=0,
title=users,
ylabel='Calls',
xlabel='Months')
ax.set_xticklabels(plot_df.index.strftime("%Y-%m"))
plt.show()
df
: df
:
User_id Calls
Index
2020-05-19 7A 8
2020-05-19 10B 5
2020-06-20 7A 2
2020-06-20 10B 6
users
: users
:
7A,10B
plot_df
: plot_df
:
Index
2020-05-31 13
2020-06-30 8
Freq: M, Name: Calls, dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.