简体   繁体   English

如何创建一个堆积条形图,指示每天花在筑巢上的时间

[英]how to create a stacked bar chart indicating time spent on nest per day

I have some data of an owl being present in the nest box.我有一些猫头鹰的数据存在于巢箱中。 In a previous question you helped me visualize when the owl is in the box:在上一个问题中,您帮助我想象猫头鹰何时在盒子里:

盒子里的猫头鹰

In addition I created a plot of the hours per day spent in the box with the code below (probably this can be done more efficiently):此外,我使用下面的代码创建了一个 plot 每天在框中花费的小时数(可能这可以更有效地完成):

import pandas as pd
import matplotlib.pyplot as plt

# raw data indicating time spent in box (each row represents start and end time)
time = pd.DatetimeIndex(["2021-12-01 18:08","2021-12-01 18:11",
                         "2021-12-02 05:27","2021-12-02 05:29",
                         "2021-12-02 22:40","2021-12-02 22:43",
                         "2021-12-03 19:24","2021-12-03 19:27",
                         "2021-12-06 18:04","2021-12-06 18:06",
                         "2021-12-07 05:28","2021-12-07 05:30",
                         "2021-12-10 03:05","2021-12-10 03:10",
                         "2021-12-10 07:11","2021-12-10 07:13",
                         "2021-12-10 20:40","2021-12-10 20:41",
                         "2021-12-12 19:42","2021-12-12 19:45",
                         "2021-12-13 04:13","2021-12-13 04:17",
                         "2021-12-15 04:28","2021-12-15 04:30",
                         "2021-12-15 05:21","2021-12-15 05:25",
                         "2021-12-15 17:40","2021-12-15 17:44",
                         "2021-12-15 22:31","2021-12-15 22:37",
                         "2021-12-16 04:24","2021-12-16 04:28",
                         "2021-12-16 19:58","2021-12-16 20:09",
                         "2021-12-17 17:42","2021-12-17 18:04",
                         "2021-12-17 22:19","2021-12-17 22:26",
                         "2021-12-18 05:41","2021-12-18 05:44",
                         "2021-12-19 07:40","2021-12-19 16:55",
                         "2021-12-19 20:39","2021-12-19 20:52",
                         "2021-12-19 21:56","2021-12-19 23:17",
                         "2021-12-21 04:53","2021-12-21 04:59",
                         "2021-12-21 05:37","2021-12-21 05:39",
                         "2021-12-22 08:06","2021-12-22 17:22",
                         "2021-12-22 20:04","2021-12-22 21:24",
                         "2021-12-22 21:44","2021-12-22 22:47",
                         "2021-12-23 02:20","2021-12-23 06:17",
                         "2021-12-23 08:07","2021-12-23 16:54",
                         "2021-12-23 19:36","2021-12-23 23:59:59",
                         "2021-12-24 00:00","2021-12-24 00:28",
                         "2021-12-24 07:53","2021-12-24 17:00",
                             ])

# create dataframe with column indicating presence (1) or absence (0)
time_df = pd.DataFrame(data={'present':[1,0]*int(len(time)/2)}, index=time)

# calculate interval length and add to time_df
time_df['interval'] = time_df.index.to_series().diff().astype('timedelta64[m]')

# add column with day to time_df
time_df['day'] = time.day

#select only intervals where owl is present 
timeinbox = time_df.iloc[1::2, :]
interval = timeinbox.interval
day = timeinbox.day

# sum multiple intervals per day
interval_tot = [interval[0]]
day_tot = [day[0]]
for i in range(1, len(day)):
    if day[i] == day[i-1]:
        interval_tot[-1] +=interval[i]
    else:
        day_tot.append(day[i])
        interval_tot.append(interval[i])

# recalculate to hours        
for i in range(len(interval_tot)):
    interval_tot[i] = interval_tot[i]/(60)

plt.figure(figsize=(15, 5))       
plt.grid(zorder=0)
plt.bar(day_tot, interval_tot, color='g', zorder=3) 

plt.xlim([1,31])
plt.xlabel('day in December')
plt.ylabel('hours per day in nest box')
plt.xticks(np.arange(1,31,1))
plt.ylim([0, 24])

Now I would like to combine all data in one plot by making a stacked bar chart, where each day is represented by a bar and each bar indicating for each of the 24*60 minutes whether the owl is present or not.现在我想通过制作堆叠条形图将所有数据合并到一个 plot 中,其中每一天由一个条形图表示,每个条形图表示 24*60 分钟的每一分钟是否存在猫头鹰。 Is this possible from the current data structure?从当前的数据结构来看,这可能吗?

The data seems to have been created manually, so I have changed the format of the data presented.数据似乎是手动创建的,因此我更改了呈现数据的格式。 The approach I took was to create the time spent and the time not spent, with a continuous index of 1 minute intervals with the start and end time as the difference time and a flag of 1. Now to create non-stay time, I will create a time series index of start and end date + 1 at 1 minute intervals.我采用的方法是创建花费的时间和未花费的时间,以 1 分钟为间隔的连续索引,以开始和结束时间作为差异时间,标志为 1。现在要创建非停留时间,我将以 1 分钟的间隔创建开始和结束日期 + 1 的时间序列索引。 Update the original data frame with the newly created index.使用新创建的索引更新原始数据框。 This is the data for the graph.这是图表的数据。 In the graph, based on the data frame extracted in days, create a color list with red for stay and green for non-stay.在图中,根据提取的以天为单位的数据框,创建一个颜色列表,红色表示停留,绿色表示非停留。 Then, in a bar graph, stack the height one.然后,在条形图中,堆叠高度一。 It may be necessary to consider grouping the data into hourly units.可能有必要考虑将数据分组为小时单位。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import timedelta
import io

data = '''
start_time,end_time
"2021-12-01 18:08","2021-12-01 18:11"
"2021-12-02 05:27","2021-12-02 05:29"
"2021-12-02 22:40","2021-12-02 22:43"
"2021-12-03 19:24","2021-12-03 19:27"
"2021-12-06 18:04","2021-12-06 18:06"
"2021-12-07 05:28","2021-12-07 05:30"
"2021-12-10 03:05","2021-12-10 03:10"
"2021-12-10 07:11","2021-12-10 07:13"
"2021-12-10 20:40","2021-12-10 20:41"
"2021-12-12 19:42","2021-12-12 19:45"
"2021-12-13 04:13","2021-12-13 04:17"
"2021-12-15 04:28","2021-12-15 04:30"
"2021-12-15 05:21","2021-12-15 05:25"
"2021-12-15 17:40","2021-12-15 17:44"
"2021-12-15 22:31","2021-12-15 22:37"
"2021-12-16 04:24","2021-12-16 04:28"
"2021-12-16 19:58","2021-12-16 20:09"
"2021-12-17 17:42","2021-12-17 18:04"
"2021-12-17 22:19","2021-12-17 22:26"
"2021-12-18 05:41","2021-12-18 05:44"
"2021-12-19 07:40","2021-12-19 16:55"
"2021-12-19 20:39","2021-12-19 20:52"
"2021-12-19 21:56","2021-12-19 23:17"
"2021-12-21 04:53","2021-12-21 04:59"
"2021-12-21 05:37","2021-12-21 05:39"
"2021-12-22 08:06","2021-12-22 17:22"
"2021-12-22 20:04","2021-12-22 21:24"
"2021-12-22 21:44","2021-12-22 22:47"
"2021-12-23 02:20","2021-12-23 06:17"
"2021-12-23 08:07","2021-12-23 16:54"
"2021-12-23 19:36","2021-12-24 00:00"
"2021-12-24 00:00","2021-12-24 00:28"
"2021-12-24 07:53","2021-12-24 17:00"
'''
df = pd.read_csv(io.StringIO(data), sep=',')
df['start_time'] = pd.to_datetime(df['start_time'])
df['end_time'] = pd.to_datetime(df['end_time'])

time_df = pd.DataFrame()
for idx, row in df.iterrows():
    rng = pd.date_range(row['start_time'], row['end_time']-timedelta(minutes=1), freq='1min')
    tmp = pd.DataFrame({'present':[1]*len(rng)}, index=rng)
    time_df = time_df.append(tmp)
date_add = pd.date_range(time_df.index[0].date(), time_df.index[-1].date()+timedelta(days=1), freq='1min')
time_df = time_df.reindex(date_add, fill_value=0)
time_df['day'] = time_df.index.day

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(8,15))
ax.set_yticks(np.arange(0,1500,60))
ax.set_ylim(0,1440)
ax.set_xticks(np.arange(1,25,1))

days = time_df['day'].unique()
for d in days:
    #if d == 1:
    day_df = time_df.query('day == @d')
    colors = [ 'r' if p == 1 else 'g' for p in day_df['present']]
    for i in range(len(day_df)): 
        ax.bar(d, height=1, width=0.5, bottom=i+1, color=colors[i])

plt.show()

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM