[英]Pandas histogram bins alignment
我有一个看起来像这样的数据框:
train_data_10users = pd.DataFrame({'target':['A','A','B', 'B', 'C'], 'day_of_week':[4,2,4,4,1]})
target day_of_week
0 A 4
1 A 2
2 B 4
3 B 4
4 C 1
我想按 day_of_week 为每个目标创建一个计数直方图,即
"A" should have:
0,1,3,5,6:0
2,4:1
"B" should have
0,1,2,3,5,6:0
4:2
"C" should have 1:1, the rest:0
这是显示我想要在直方图上显示的真实数据的数据透视表(注意:fillna):
pivot = pd.pivot_table(train_data_10users,
index=["target"], columns=["day_of_week"], aggfunc='size', fill_value=0)
day_of_week 0 1 2 3 4 5 6
target
Ashley 390 328 1078 293 115 0 0
Avril 148 402 273 318 87 104 311
Bill 308 239 105 24 54 7 65
Bob 51 285 72 284 330 0 0
即使 groupby 中可能缺少某些日子,添加适当的 xticks 也能解决问题:
from matplotlib import pyplot as plt
import pandas as pd
fig, axes = plt.subplots(nrows=3, ncols=4, figsize=(16, 10))
for idx, (user, sub_df) in enumerate(
pd.groupby(train_data_10users[["target", "day_of_week"]], 'target')):
ax = axes[idx // 4, idx % 4]
sub_df.hist(ax=ax, label=user, color=color_dic.get(user), bins=7)
ax.set_xticks(range(7))
ax.legend()
但是这些值并不完全对齐/居中,而且位置有点浮动,我认为这取决于每个目标存在/缺失的天数:
更新。 这是根据接受的答案的外观:
fig, axes = plt.subplots(nrows=3, ncols=4, figsize=(16, 10), sharey=True)
...
sub_df.hist(ax=ax, label=user, color=color_dic.get(user), bins=range(8))
ax.set_xticks(range(8))
ax.set_xticks(np.arange(8)+0.5)
ax.set_xticklabels(range(7))
尝试:
fig, axes = plt.subplots(nrows=3, ncols=4, figsize=(16, 10))
for idx, (user, sub_df) in enumerate(
pd.groupby(train_data_10users[["target", "day_of_week"]], 'target')):
ax = axes[idx // 4, idx % 4]
# note bin is forced to range(7)
sub_df.hist(ax=ax, label=user, bins=range(7))
# offset the xticks
ax.set_xticks(np.arange(7) + .5)
# name the label accordingly
ax.set_xticklabels(range(7))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.