简体   繁体   English

Pandas可视化时间序列

[英]Pandas visualization time series

I have time series class data.我有时间序列 class 数据。 First column contains join time .第一列包含join time Second column contains leave time for various students.Third column is Class ID .第二列包含不同学生的leave time 。第三列是Class ID So there is possibility that student left the class in 10 min and again joined it after some time.因此,学生有可能在 10 分钟内离开 class 并在一段时间后再次加入。 His time is again recorded for both activities.他的时间再次被记录在这两项活动中。 I want to visualize data in order to see at what time maximum students attended class.我想可视化数据,以查看最多学生在什么时间参加 class。

data['Join Time Hour'] = data['Join Time'].dt.hour        
data['Join Time Date'] = data['Join Time'].dt.date
data['Leave Time Hour'] = data['Leave Time'].dt.hour        
data['Leave Time Date'] = data['Leave Time'].dt.date

My approach:我的做法:

1. 1.

# Added one dummy
data['Dummy Column'] = 1     

2. 2.

data1 = (
    pd.pivot_table(data, 
                   values='Dummy Column', 
                   index='Join Time Date', 
                   columns='Join Time Hour', 
                   aggfunc='sum')  
)

3. 3.

sns.heatmap(data1, cmap="Blues")
plt.show()

Otput:输出: 在此处输入图像描述

This output gives me heatmap based on sum of dummy variable for given hour of class.这个 output 根据 class 给定小时的虚拟变量总和为我提供了热图。 Its not considering leave time in visualization.它没有考虑可视化的leave time

I want我想

  1. visualize (in heatmap or other visualization) exactly which moments in videos are most-watched, and准确地可视化(在热图或其他可视化中)视频中观看次数最多的时刻,以及
  2. where student tend to drop off last time学生上次倾向于下车的地方

Thanks!!谢谢!!

I think a sankey diagram can resolve your problem.我认为桑基图可以解决您的问题。 Below is my test code.下面是我的测试代码。

import pandas as pd
import numpy as np
from itertools import product
import seaborn as sns

from plotly.offline import init_notebook_mode, iplot

init_notebook_mode(connected=True)

# generate test hours between 8:00 and 12:00
times = pd.date_range("202005010800", '202005011200', freq="H")

# generate test 1000 students id 
students = list(np.arange(1000))

# generate test class
classes = ["class A", "class B"]

# make nodes including classes, join times and leave times
nodes = []
join_time = [time.strftime("join time: %Y%m%d %H") for time in times]
leave_time = [time.strftime("leave time: %Y%m%d %H") for time in times]
nodes.extend(classes)
nodes.extend(join_time)
nodes.extend(leave_time)

# every node has a color and an id
df_nodes = pd.DataFrame(nodes, columns=['node'])
df_nodes['color'] = list(sns.palettes.xkcd_rgb.values())[:len(df_nodes)]
df_nodes['node id'] = df_nodes.index

# nodes dict used in links 
nodes_id_dict = dict(zip(df_nodes['node'], df_nodes['node id']))
nodes_color_dict = dict(zip(df_nodes['node'], df_nodes['color']))

# make records
records = product(times, times, students, classes)

# filter records whose leave time is later than join time
records = [record for record in records if record[1] > record[0]]

df_records = pd.DataFrame(
    records, columns=['join time', 'leave time', 'student', 'class']
)

#  pick 10000 records randomly
df_records = df_records.sample(10000)

# format time to use nodes dict 
df_records['join time'] = df_records['join time'].\
dt.strftime("join time: %Y%m%d %H")
df_records['leave time'] = df_records['leave time'].\
dt.strftime("leave time: %Y%m%d %H")

# the first link from class to join time
class_join_time = df_records.groupby(['class', 'join time']) \
['student'].count().reset_index()
class_join_time.columns = ['source', 'target', 'value']

# the second link from join time to leave time
join_leave_time = df_records.groupby(['join time', 'leave time'])\
['student'].count().reset_index()
join_leave_time.columns = ['source', 'target', 'value']

# merge the two links
df_links = pd.concat([class_join_time, join_leave_time])

# use nodes dict to get node id and link color
# you can generate colors customly
df_links['source id'] = df_links['source'].replace(nodes_id_dict)
df_links['target id'] = df_links['target'].replace(nodes_id_dict)
df_links['link color'] = df_links['target'].replace(nodes_color_dict)

# configure the data_trace
data_trace = dict(
    type='sankey',
    domain = dict(
      x =  [0,1],
      y =  [0,1]
    ),
    orientation = "h", # horizontal
    valueformat = ".0f", 
    node = dict(
       pad = 10,
       line = dict(
         color = "rgba(0,0,0,0.5)",
         width = 0.1
      ),
         label =  df_nodes['node'],
         color = df_nodes['color']
    ),
    link = dict(
         source = df_links['source id'],
         target = df_links['target id'],
         value = df_links['value'],
         color = df_links['link color'],
         line = dict(
             color = "rgba(0,0,0,0.5)",
             width = 0.1
        ),
    )
)
# cofigure the layout
layout = dict(
    title = "Sankey Diagram Test",
    height = 640,
    width = 900,
    font = dict(
        size=12
    )
)

# plot
fig = dict(data=[data_trace], layout=layout)
iplot(fig, validate=False)

The sankey diagram is shown below:桑基图如下所示: 桑基图

I tried visualization with a heat map.我尝试使用热 map 进行可视化。 The data is created appropriately.数据被适当地创建。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

join_t = pd.date_range('2020-04-01 09:00:00', '2020-04-30 20:00:00', freq='BH')
leave_t = pd.date_range('2020-04-01 09:00:00', '2020-04-30 20:00:00', freq='BH')
persons = np.random.randint(1,40,(176,))

df = pd.DataFrame({'Join Time':pd.to_datetime(join_t),
                  'Leave Time':pd.to_datetime(leave_t),
                  'Persons':persons})

df['Leave Time'] = df['Leave Time'].shift(-1, fill_value=df.iloc[-1]['Leave Time'])
df['Join Time Hour'] = df['Join Time'].dt.hour    
df['Join Time Date'] = df['Join Time'].dt.date
df['Leave Time Hour'] = df['Leave Time'].dt.hour       
df['Leave Time Date'] = df['Leave Time'].dt.date

df.loc[:,['Join Time Date','Join Time Hour','Persons']]


fig = plt.figure(figsize=(8,6),dpi=144)
ax = fig.add_subplot(111)

data = df.pivot(index='Join Time Date', columns='Join Time Hour', values='Persons')
ax = sns.heatmap(data, ax=ax, annot=True, cmap="YlGnBu")

plt.show()

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM