简体   繁体   English

如何绘制时间序列来研究项目的频率?

[英]How to plot a time-series to study frequency of items?

I would need to plot through time same values to see how frequency changes.我需要绘制时间相同的值以查看频率如何变化。 In particular posts generated from different users through time.特别是随着时间从不同用户生成的帖子。 I have a dataset like the following:我有一个如下所示的数据集:

       GENDER          POST  DATE  COUNTER
0      men    (post 103)     36        43
1      men    (post 109)     38        2
2      men    (post 116)     41        12
3      men    (post 119)     42        32
4      men    (post 124)     44        2
..       ...           ...   ...      ...
82   women     (post 83)     29        34
83   women     (post 86)     30        2
84   women     (post 86)     65        9
85   women     (post 91)     32        5
86   women     (post 99)     35        5

where DATE is numerical (sequential numbers rather than date format) What I initially thought was to select the columns I am interested in, by using seaborn:其中 DATE 是数字(顺序数字而不是日期格式)我最初的想法是通过使用 seaborn 来选择我感兴趣的列:

from  matplotlib import pyplot
import seaborn

fg = seaborn.FacetGrid(data=df_, hue='GENDER', aspect=1.61)
fg.map(pyplot.scatter, 'DATE', 'COUNTER').add_legend()

but in order to have something like the plot shown in the picture below:但为了有如下图所示的情节:

https://imgur.com/bAKogi9 https://imgur.com/bAKogi9

I think I should consider a time series in order to track posts through time.我想我应该考虑一个时间序列,以便通过时间跟踪帖子。 On the x-axis of each plot there would be date ( DATE ) and on y-axis the post's frequency ( COUNTER ).在每个图的 x 轴上会有日期( DATE ),在 y 轴上会有帖子的频率( COUNTER )。

The file csv that I am considering for this analysis includes the following columns:我正在考虑用于此分析的文件 csv 包括以下列:

file = '...'

with open(file, newline='') as csvfile:
    df = csv.reader(csvfile, delimiter=';', quotechar='|')
    for row in df:
         print(' '.join(row)) 
df = pd.read_csv(file, sep=';') # or your sep in file
df.columns = [' ', 'GENDER', 'POST', 'DATE', 'COUNTER',' ']

Thank you so much for your time and for helping me.非常感谢您抽出时间帮助我。

Update:更新:

        GENDER  POST DATE COUNTER
0       (man 8) (post 4) 0  0   NaN
1       (woman 13)  (post 1) 2  0   NaN
2       (man 14)    (post 7) 2  2   NaN
3       (man 8) (post 4) 4  1 NaN
4       (woman 19)  (post 12) 4 1   NaN

First, let's filter your dataframe so you only have a handful of posts:首先,让我们过滤您的数据框,以便您只有少数帖子:

import seaborn as sns

post_list = ['(post 103)','(post 109)','(post 116)']
df2 = df[df.POST.isin(post_list)]

Then, something like this should do:然后,这样的事情应该做:

for post in df2.POST.unique():
    sns.lineplot(x='DATE',y='COUNTER', hue='GENDER', data=df2[df2.POST==post])
plt.show()

If you don't care about confidence intervals you can add ci=None to the sns call, which will make the code run faster.如果您不关心置信区间,您可以在sns调用中添加ci=None ,这将使代码运行得更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM