简体   繁体   English

从 python 中的 timedelta64 对象创建箱线图

[英]Create boxplot from timedelta64 object in python

I am new to python and used to work with R.我是 Python 新手,曾经使用过 R。

Usually I created numerical vectors of timedeltas and built boxplots.通常我创建时间增量的数值向量并构建箱线图。 In python it seems to be a bit more complex.在python中,它似乎有点复杂。 Here is an extract of the list i got.这是我得到的清单的摘录。

1502    4 days 17:51:16
1503    4 days 17:51:57
1504    4 days 17:48:24
1505    4 days 17:34:16
1506    4 days 17:32:58
1507    4 days 19:21:27
1508    4 days 19:52:43
1509    4 days 19:37:17
1510    4 days 21:00:30
1511    5 days 00:56:52
1512    3 days 00:56:04
1513                NaT
Length: 1514, dtype: timedelta64[ns]

And i tried this on the list:我在名单上试过这个:

    # Create a figure instance
fig = plt.figure(1, figsize=(9, 6))

# Create an axes instance
ax = fig.add_subplot(111)

# Create the boxplot
bp = ax.boxplot(timediff)

# Save the figure
fig.savefig('fig1.png', bbox_inches='tight')

I do get an output, but it seems to be completely wrong.我确实得到了一个输出,但它似乎完全错误。 Can someone help me?有人能帮我吗? Is there a mistake in the datatypes?数据类型是否有错误?

在此处输入图片说明

Currently, your boxplot uses Unix time (number of seconds elapsed since epoch, 1970-01-01 00:00:00 ) representing your time difference, timedelta64[ns] , values.目前,您的箱线图使用 Unix 时间(自纪元以来经过的秒数, 1970-01-01 00:00:00 )代表您的时差timedelta64[ns]值。 Hence, the y-axis units are in very large integer scale: 1e19 .因此,y 轴单位采用非常大的整数比例: 1e19

Consider converting the time difference values in the units you require: days with decimal points.考虑以您需要的单位转换时差值:带小数点的天数。 Then plot the series.然后绘制系列。

timediff = timediff_raw.dt.days + \
              (timediff_raw.dt.seconds//3600) / 24 + \
              ((timediff_raw.dt.seconds//60)%60) / (24*60)

print(timediff.head(10))

Note : The very low outlier will remain as same graph will render but with different y-axis units.注意:非常低的异常值将保持不变,因为将呈现相同的图形,但具有不同的 y 轴单位。


To demonstrate with a reproducible, random seeded example:使用可重现的随机种子示例进行演示:

Data (series of 50 elements)数据(50 个元素的系列)

import numpy as np
import pandas as pd
import datetime as dt
import time
import matplotlib.pyplot as plt

# CURRENT TIME STAMP
epoch_time = int(time.time())

np.random.seed(81618)
time1 = pd.Series([dt.datetime.fromtimestamp(np.random.randint(1530000000, epoch_time)) 
                   for _ in range(50)])
time2 = pd.Series([dt.datetime.fromtimestamp(np.random.randint(1530000000, epoch_time)) 
                   for _ in range(50)])

print(time1.head())
# 0   2018-07-29 04:12:07
# 1   2018-07-02 07:48:08
# 2   2018-08-17 05:04:59
# 3   2018-08-06 21:37:45
# 4   2018-07-15 10:27:10
# dtype: datetime64[ns]

print(time2.head())
# 0   2018-07-25 09:11:39
# 1   2018-08-15 07:05:39
# 2   2018-07-06 08:19:05
# 3   2018-07-13 19:08:30
# 4   2018-07-24 11:13:06
# dtype: datetime64[ns]

Time Difference Conversion (using pandas.Series.dt )时差转换(使用pandas.Series.dt

timediff_raw = (time1 - time2)

timediff = timediff_raw.dt.days + \
              (timediff_raw.dt.seconds / (60*60*24))  # NUMBER OF SECONDS IN A DAY

print(timediff.head(10))
# 0     3.791991
# 1   -43.970498
# 2    41.865208
# 3    24.103646
# 4    -9.031898
# dtype: float64

Graph图形

# Create a figure instance
fig = plt.figure(figsize=(9, 6))

# Create an axes instance
ax = fig.add_subplot(111)

# Create the boxplot
ax.boxplot(timediff)
plt.xlabel('Single Series')
plt.ylabel('Time Difference (Days)')

plt.show()
plt.clf()
plt.close('all')

绘图输出

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM