简体   繁体   English

如何在循环中使用 plot 水平条 plot 根据另一列中的值更改条上的颜色

[英]how to plot horizontal bar plot in loop to change the color on the bar based on the value in another column

I need to plot time(timestamp) vs space(intersectionId) single horizontal bar chart in matplotlib.我需要 matplotlib 中的 plot 时间(时间戳)与空间(intersectionId)单个水平条形图。 The color of the bar will be changed at time intervals based on another column which will the currState.条形图的颜色将根据将 currState 的另一列按时间间隔更改。 The colors will be red,green,yellow. colors 将为红色、绿色、黄色。 I have tried to create a dictionary of colors and values but unsure of how to use them in loop to change color based on the value.我试图创建 colors 和值的字典,但不确定如何在循环中使用它们来根据值更改颜色。 I have attached a sample csv below along with a code and what I try to achieve and what I have written till now.我在下面附上了一个示例 csv 以及代码以及我试图实现的目标以及我到目前为止所写的内容。

category_colors = { 'red' : [2,3] , 'yellow' : [5,6] , 'green' : [7,8]}
date_test =  df_sample['timestamp']
y_test = ['123456']
data = np.array(list(df_sample.currState))
fig, ax = plt.subplots(figsize=(10, 1))
ax = plt.barh(y_test,date_test,label="trafficsignal")
data_cum = data.cumsum
plt.xlabel('timestamp') 
plt.ylabel('space')
plt.title('TimeSpace')
plt.legend()
plt.show()
timestamp                                       currState          IntersectionId    
2020-02-26 16:12:13.131484                        3                    12345
2020-02-26 16:12:14.131484                        3                    12345
2020-02-26 16:12:15.131484                        3                    12345
2020-02-26 16:12:16.131484                        5                    12345
2020-02-26 16:12:17.131484                        5                    12345
2020-02-26 16:12:18.131484                        5                    12345
2020-02-26 16:12:19.131484                        6                    12345
2020-02-26 16:12:20.131484                        6                    12345
2020-02-26 16:12:21.131484                        6                    12345

Current plot:当前 plot:

阴谋

Desired plot:所需的 plot:

我想达到的情节

I am not aware of any plotting package that lets you create this plot in a straightforward way based on how your sample table is structured.我不知道有任何绘图 package 可让您根据样本表的结构以简单的方式创建此 plot。 One option could be to compute a start and an end variable and then create the plot like in the answers to this question , for example using the Altair Gantt chart like in this answer .一种选择可能是计算startend变量,然后创建 plot 就像在这个问题的答案中一样,例如使用Altair Gantt 图,就像在这个答案中一样。

Here, I offer two solutions using matplotlib.在这里,我提供了两种使用 matplotlib 的解决方案。 By taking a look at the matplotlib gallery , I stumbled on the broken_barh plotting function which provides a way to create a plot like the one you want.通过查看matplotlib 库,我偶然发现了broken_barh绘图 function 提供了一种创建 plot 的方法,就像你想要的那样。 There are two main hurdles to overcome when using it:使用它时有两个主要障碍需要克服:

  1. Deciding what unit to use for the x-axis and computing the xranges argument accordingly;决定 x 轴使用什么单位并相应地计算xranges参数;
  2. Creating and formatting the x ticks and tick labels.创建和格式化 x 刻度和刻度标签。

Let me first create a sample dataset that resembles yours, note that you will need to adjust the color_dict to your codes:让我首先创建一个类似于您的示例数据集,注意您需要将color_dict调整为您的代码:

import numpy as np                 # v 1.19.2
import pandas as pd                # v 1.1.3
import matplotlib.pyplot as plt    # v 3.3.2
import matplotlib.dates as mdates

## Create sample dataset

# Light color codes
gre = 1
yel_to_red = 2
red = 3
yel_to_gre = 4
color_dict = {1: 'green', 2: 'yellow', 3: 'red', 4: 'yellow'}

# Light color duration in seconds
sec_g = 45
sec_yr = 3
sec_r = 90
sec_yg = 1

# Light cycle
light_cycle = [gre, yel_to_red, red, yel_to_gre]
sec_cycle = [sec_g, sec_yr, sec_r, sec_yg]

ncycles = 3
sec_total = ncycles*sum(sec_cycle)

# Create variables and store them in a pandas dataframe with the datetime as index
IntersectionId = 12345
currState = np.repeat(ncycles*light_cycle, repeats=ncycles*sec_cycle)
time_sec = pd.date_range(start='2021-01-04 08:00:00', freq='S', periods=sec_total)
df = pd.DataFrame(dict(IntersectionId = np.repeat(12345, repeats=ncycles*sum(sec_cycle)),
                       currState = currState),
                  index = time_sec)

The broken_barh function takes the data in the format of tuples where for each colored rectangle that makes up the horizontal bar you need to provide the xy coordinates of the bottom-left corner as well as the length along each axis, like so: broken_barh function 采用元组格式获取数据,其中对于构成水平条的每个彩色矩形,您需要提供左下角的 xy 坐标以及沿每个轴的长度,如下所示:

xranges=[(x1_start, x1_length), (x2_start, x2_length), ... ], yranges=(y_all_start, y_all_width)

Note that yranges applies to all rectangles.请注意, yranges适用于所有矩形。 The unit that is chosen for the x-axis determines how the data must be processed and how the x ticks and tick labels can be created.为 x 轴选择的单位决定了必须如何处理数据以及如何创建 x 刻度和刻度标签。 Here are two alternatives.这里有两种选择。



Matplotlib broken_barh with matplotlib date number as x-axis scale Matplotlib broken_barh与 matplotlib 日期编号作为 x 轴刻度

In this approach, the timestamps of the rows where the light changes are extracted and then converted to matplotlib date numbers .在这种方法中,提取光线变化的行的时间戳,然后转换为matplotlib 日期数字 This makes it possible to use a matplotlib date tick locator and formatter .这使得使用 matplotlib 日期刻度定位器格式化程序成为可能。 This approach of using the matplotlib date for the x-axis values to simplify tick formatting was inspired by this answer by ImportanceOfBeingErnest .这种使用 matplotlib 日期作为 x 轴值来简化刻度格式的方法受到ImportanceOfBeingErnest 这个答案的启发。

For both this solution and the next one, the code for getting the indices of light changes and computing the lengths of the periods is based on this answer by Jaime , thanks to the general idea provided by this Gist by alimanfoo .对于这个解决方案和下一个解决方案,获取光变化指数和计算周期长度的代码都是基于Jaime 的这个答案,这要归功于alimanfoo 提供的这个 Gist提供的总体思路。

## Compute variables needed to define the plotting function arguments

states = np.array(df['currState'])

# Create a list of indices of the rows where the light changes
# (i.e. where a new currState code section starts)
starts_indices = np.where(np.concatenate(([True], states[:-1] != states[1:])))

# Append the last index to be able to compute the duration of the last
# light color period recorded in the dataset
starts_end_indices = np.append(starts_indices, states.size-1)

# Get the timestamps of those rows and convert them to python datetime format
starts_end_pydt = df.index[starts_end_indices].to_pydatetime()

# Convert the python timestamps to matplotlib date number that is used as the
# x-axis unit, this makes it easier to format the tick labels
starts_end_x = mdates.date2num(starts_end_pydt)

# Get the duration of each light color in matplotlib date number units
lengths = np.diff(starts_end_x)

# Add one second (computed in python datetime units) to the duration of
# the last light to make the bar chart left and right inclusive instead
# of just left inclusive
pydt_second = (max(starts_end_x) - min(starts_end_x))/starts_end_indices[-1]
lengths[-1] = lengths[-1] + pydt_second

# Compute the arguments for the broken_barh plotting function
xranges = [(start, length) for start, length in zip(starts_end_x, lengths)]
yranges = (0.75, 0.5)
colors = df['currState'][starts_end_indices[:-1]].map(color_dict)


## Create horizontal bar with colors by using the broken_barh function
## and format ticks and tick labels

fig, ax = plt.subplots(figsize=(10,2))
ax.broken_barh(xranges, yranges, facecolors=colors, zorder=2)

# Create and format x ticks and tick labels
loc = mdates.AutoDateLocator()
ax.xaxis.set_major_locator(loc)
formatter = mdates.AutoDateFormatter(loc)
formatter.scaled[1/(24.*60.)] = '%H:%M:%S' # adjust this according to time range
ax.xaxis.set_major_formatter(formatter)

# Format y-axis and create y tick and tick label
ax.set_ylim(0, 2)
ax.set_yticks([1])
ax.set_yticklabels([df['IntersectionId'][0]])

plt.grid(axis='x', alpha=0.5, zorder=1)

plt.show()

破碎的barh1



Matplotlib broken_barh with seconds as x-axis scale Matplotlib broken_barh以秒为 x 轴刻度

This approach takes advantage of the fact that the indices of the table can be used to compute the lights' durations in seconds.这种方法利用了表的索引可用于计算灯的持续时间(以秒为单位)这一事实。 The downside is that this time the x ticks and tick labels must be created from scratch.缺点是这次 x 刻度和刻度标签必须从头开始创建。 The code is written so that labels automatically have a nice format depending on the total duration covered by the dataset.编写代码以便标签根据数据集覆盖的总持续时间自动具有良好的格式。 The only thing that needs adjusting is the number of ticks, as this depends on how wide the figure is.唯一需要调整的是刻度数,因为这取决于图形的宽度。

The code used to automatically select an appropriate time step between ticks is based on this answer by kennytm .用于自动 select 刻度之间的适当时间步长的代码基于kennytm 的这个答案 The datetime string format codes are listed here . 此处列出了日期时间字符串格式代码。

## Compute the variables needed for the plotting function arguments
## using the currState variable

states = np.array(df['currState'])

# Create list of indices indicating the rows where the currState code
# changes: note the comma to unpack the tuple
starts_indices, = np.where(np.concatenate(([True], states[:-1] != states[1:])))
# Compute durations of each light in seconds
lengths = np.diff(starts_indices, append=states.size)


## Compute the arguments for the plotting function

xranges = [(start, length) for start, length in zip(starts_indices, lengths)]
yranges = (0.75, 0.5)
colors = df['currState'][starts_indices].map(color_dict)


## Create horizontal bar with colors using the broken_barh function

fig, ax = plt.subplots(figsize=(10,2))
ax.broken_barh(xranges, yranges, facecolors=colors, zorder=2)


## Create appropriate x ticks and tick labels

# Define time variable and parameters needed for computations
time = pd.DatetimeIndex(df.index).asi8 // 10**9 # time is in seconds
tmin = min(time)
tmax = max(time)
trange = tmax-tmin

# Choose the approximate number of ticks, the exact number depends on
# the automatically selected time step
approx_nticks = 6 # low number selected because figure width is only 10 inches
round_time_steps = [15, 30, 60, 120, 180, 240, 300, 600, 900, 1800, 3600, 7200, 14400]
time_step = min(round_time_steps, key=lambda x: abs(x - trange//approx_nticks))

# Create list of x ticks including the right boundary of the last time point
# in the dataset regardless of whether not it is aligned with the time step
timestamps = np.append(np.arange(tmin, tmax, time_step), tmax+1)
xticks = timestamps-tmin
ax.set_xticks(xticks)

# Create x tick labels with format depending on time step
fmt_time = '%H:%M:%S' if time_step <= 60 else '%H:%M'
xticklabels = [pd.to_datetime(ts, unit='s').strftime(fmt_time) for ts in timestamps]
ax.set_xticklabels(xticklabels)


## Format y-axis limits, tick and tick label

ax.set_ylim(0, 2)
ax.set_yticks([1])
ax.set_yticklabels([df['IntersectionId'][0]])


plt.grid(axis='x', alpha=0.5, zorder=1)

plt.show()

破碎的barh2



Further documentation: to_datetime , to_pydatetime , strftime更多文档: to_datetimeto_pydatetimestrftime

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM