[英]Create boxplot from timedelta64 object in python
I am new to python and used to work with R.我是 Python 新手,曾经使用过 R。
Usually I created numerical vectors of timedeltas and built boxplots.通常我创建时间增量的数值向量并构建箱线图。 In python it seems to be a bit more complex.
在python中,它似乎有点复杂。 Here is an extract of the list i got.
这是我得到的清单的摘录。
1502 4 days 17:51:16
1503 4 days 17:51:57
1504 4 days 17:48:24
1505 4 days 17:34:16
1506 4 days 17:32:58
1507 4 days 19:21:27
1508 4 days 19:52:43
1509 4 days 19:37:17
1510 4 days 21:00:30
1511 5 days 00:56:52
1512 3 days 00:56:04
1513 NaT
Length: 1514, dtype: timedelta64[ns]
And i tried this on the list:我在名单上试过这个:
# Create a figure instance
fig = plt.figure(1, figsize=(9, 6))
# Create an axes instance
ax = fig.add_subplot(111)
# Create the boxplot
bp = ax.boxplot(timediff)
# Save the figure
fig.savefig('fig1.png', bbox_inches='tight')
I do get an output, but it seems to be completely wrong.我确实得到了一个输出,但它似乎完全错误。 Can someone help me?
有人能帮我吗? Is there a mistake in the datatypes?
数据类型是否有错误?
Currently, your boxplot uses Unix time (number of seconds elapsed since epoch, 1970-01-01 00:00:00
) representing your time difference, timedelta64[ns]
, values.目前,您的箱线图使用 Unix 时间(自纪元以来经过的秒数,
1970-01-01 00:00:00
)代表您的时差timedelta64[ns]
值。 Hence, the y-axis units are in very large integer scale: 1e19
.因此,y 轴单位采用非常大的整数比例:
1e19
。
Consider converting the time difference values in the units you require: days with decimal points.考虑以您需要的单位转换时差值:带小数点的天数。 Then plot the series.
然后绘制系列。
timediff = timediff_raw.dt.days + \
(timediff_raw.dt.seconds//3600) / 24 + \
((timediff_raw.dt.seconds//60)%60) / (24*60)
print(timediff.head(10))
Note : The very low outlier will remain as same graph will render but with different y-axis units.注意:非常低的异常值将保持不变,因为将呈现相同的图形,但具有不同的 y 轴单位。
To demonstrate with a reproducible, random seeded example:使用可重现的随机种子示例进行演示:
Data (series of 50 elements)数据(50 个元素的系列)
import numpy as np
import pandas as pd
import datetime as dt
import time
import matplotlib.pyplot as plt
# CURRENT TIME STAMP
epoch_time = int(time.time())
np.random.seed(81618)
time1 = pd.Series([dt.datetime.fromtimestamp(np.random.randint(1530000000, epoch_time))
for _ in range(50)])
time2 = pd.Series([dt.datetime.fromtimestamp(np.random.randint(1530000000, epoch_time))
for _ in range(50)])
print(time1.head())
# 0 2018-07-29 04:12:07
# 1 2018-07-02 07:48:08
# 2 2018-08-17 05:04:59
# 3 2018-08-06 21:37:45
# 4 2018-07-15 10:27:10
# dtype: datetime64[ns]
print(time2.head())
# 0 2018-07-25 09:11:39
# 1 2018-08-15 07:05:39
# 2 2018-07-06 08:19:05
# 3 2018-07-13 19:08:30
# 4 2018-07-24 11:13:06
# dtype: datetime64[ns]
Time Difference Conversion (using pandas.Series.dt )时差转换(使用pandas.Series.dt )
timediff_raw = (time1 - time2)
timediff = timediff_raw.dt.days + \
(timediff_raw.dt.seconds / (60*60*24)) # NUMBER OF SECONDS IN A DAY
print(timediff.head(10))
# 0 3.791991
# 1 -43.970498
# 2 41.865208
# 3 24.103646
# 4 -9.031898
# dtype: float64
Graph图形
# Create a figure instance
fig = plt.figure(figsize=(9, 6))
# Create an axes instance
ax = fig.add_subplot(111)
# Create the boxplot
ax.boxplot(timediff)
plt.xlabel('Single Series')
plt.ylabel('Time Difference (Days)')
plt.show()
plt.clf()
plt.close('all')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.