[英]How to plot a time-series to study frequency of items?
I would need to plot through time same values to see how frequency changes.我需要绘制时间相同的值以查看频率如何变化。 In particular posts generated from different users through time.
特别是随着时间从不同用户生成的帖子。 I have a dataset like the following:
我有一个如下所示的数据集:
GENDER POST DATE COUNTER
0 men (post 103) 36 43
1 men (post 109) 38 2
2 men (post 116) 41 12
3 men (post 119) 42 32
4 men (post 124) 44 2
.. ... ... ... ...
82 women (post 83) 29 34
83 women (post 86) 30 2
84 women (post 86) 65 9
85 women (post 91) 32 5
86 women (post 99) 35 5
where DATE is numerical (sequential numbers rather than date format) What I initially thought was to select the columns I am interested in, by using seaborn:其中 DATE 是数字(顺序数字而不是日期格式)我最初的想法是通过使用 seaborn 来选择我感兴趣的列:
from matplotlib import pyplot
import seaborn
fg = seaborn.FacetGrid(data=df_, hue='GENDER', aspect=1.61)
fg.map(pyplot.scatter, 'DATE', 'COUNTER').add_legend()
but in order to have something like the plot shown in the picture below:但为了有如下图所示的情节:
https://imgur.com/bAKogi9 https://imgur.com/bAKogi9
I think I should consider a time series in order to track posts through time.我想我应该考虑一个时间序列,以便通过时间跟踪帖子。 On the x-axis of each plot there would be date (
DATE
) and on y-axis the post's frequency ( COUNTER
).在每个图的 x 轴上会有日期(
DATE
),在 y 轴上会有帖子的频率( COUNTER
)。
The file csv that I am considering for this analysis includes the following columns:我正在考虑用于此分析的文件 csv 包括以下列:
file = '...'
with open(file, newline='') as csvfile:
df = csv.reader(csvfile, delimiter=';', quotechar='|')
for row in df:
print(' '.join(row))
df = pd.read_csv(file, sep=';') # or your sep in file
df.columns = [' ', 'GENDER', 'POST', 'DATE', 'COUNTER',' ']
Thank you so much for your time and for helping me.非常感谢您抽出时间帮助我。
Update:更新:
GENDER POST DATE COUNTER
0 (man 8) (post 4) 0 0 NaN
1 (woman 13) (post 1) 2 0 NaN
2 (man 14) (post 7) 2 2 NaN
3 (man 8) (post 4) 4 1 NaN
4 (woman 19) (post 12) 4 1 NaN
First, let's filter your dataframe so you only have a handful of posts:首先,让我们过滤您的数据框,以便您只有少数帖子:
import seaborn as sns
post_list = ['(post 103)','(post 109)','(post 116)']
df2 = df[df.POST.isin(post_list)]
Then, something like this should do:然后,这样的事情应该做:
for post in df2.POST.unique():
sns.lineplot(x='DATE',y='COUNTER', hue='GENDER', data=df2[df2.POST==post])
plt.show()
If you don't care about confidence intervals you can add ci=None
to the sns
call, which will make the code run faster.如果您不关心置信区间,您可以在
sns
调用中添加ci=None
,这将使代码运行得更快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.