简体   繁体   中英

How to plot a time-series to study frequency of items?

I would need to plot through time same values to see how frequency changes. In particular posts generated from different users through time. I have a dataset like the following:

       GENDER          POST  DATE  COUNTER
0      men    (post 103)     36        43
1      men    (post 109)     38        2
2      men    (post 116)     41        12
3      men    (post 119)     42        32
4      men    (post 124)     44        2
..       ...           ...   ...      ...
82   women     (post 83)     29        34
83   women     (post 86)     30        2
84   women     (post 86)     65        9
85   women     (post 91)     32        5
86   women     (post 99)     35        5

where DATE is numerical (sequential numbers rather than date format) What I initially thought was to select the columns I am interested in, by using seaborn:

from  matplotlib import pyplot
import seaborn

fg = seaborn.FacetGrid(data=df_, hue='GENDER', aspect=1.61)
fg.map(pyplot.scatter, 'DATE', 'COUNTER').add_legend()

but in order to have something like the plot shown in the picture below:

https://imgur.com/bAKogi9

I think I should consider a time series in order to track posts through time. On the x-axis of each plot there would be date ( DATE ) and on y-axis the post's frequency ( COUNTER ).

The file csv that I am considering for this analysis includes the following columns:

file = '...'

with open(file, newline='') as csvfile:
    df = csv.reader(csvfile, delimiter=';', quotechar='|')
    for row in df:
         print(' '.join(row)) 
df = pd.read_csv(file, sep=';') # or your sep in file
df.columns = [' ', 'GENDER', 'POST', 'DATE', 'COUNTER',' ']

Thank you so much for your time and for helping me.

Update:

        GENDER  POST DATE COUNTER
0       (man 8) (post 4) 0  0   NaN
1       (woman 13)  (post 1) 2  0   NaN
2       (man 14)    (post 7) 2  2   NaN
3       (man 8) (post 4) 4  1 NaN
4       (woman 19)  (post 12) 4 1   NaN

First, let's filter your dataframe so you only have a handful of posts:

import seaborn as sns

post_list = ['(post 103)','(post 109)','(post 116)']
df2 = df[df.POST.isin(post_list)]

Then, something like this should do:

for post in df2.POST.unique():
    sns.lineplot(x='DATE',y='COUNTER', hue='GENDER', data=df2[df2.POST==post])
plt.show()

If you don't care about confidence intervals you can add ci=None to the sns call, which will make the code run faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM