简体   繁体   English

Plotly 在 x 轴上带有 datetime.time() 和缺失值

[英]Plotly with datetime.time() in the x-axis and missing values

I have 2 pandas dataframes, df1 and df2 which both have data from 2 different days between 21:00 and 8:00.我有 2 个 pandas 数据帧,df1 和 df2,它们都包含 21:00 到 8:00 之间 2 个不同日期的数据。 The data should be 1 data point per minute, however there are there are missing values eg数据应该是每分钟 1 个数据点,但是有缺失值,例如

       location        time             Data
0          1         21:00:00            8
1          1         21:02:00            6

the data point for 21:01:00 does not exist. 21:01:00 的数据点不存在。 The missing data points occur at different times for each of the dataframes, so when I try to plot both of them on the same plot this happens:对于每个数据帧,丢失的数据点发生在不同的时间,所以当我尝试在同一个 plot 上尝试 plot 时,会发生这种情况: 在此处输入图像描述

If I plot them individually they're both correct.如果我单独 plot 他们都是正确的。 I think the horizontal red lines are caused by the time values that exist in the red dataframe but not in the blue dataframe.我认为水平红线是由红色 dataframe 中存在的时间值引起的,而不是蓝色 dataframe 中存在的时间值。

Has anyone encountered this before?有没有人遇到过这个? I want to plot both of them on the same axis, starting at 21:00 and finishing at 08:00.我想 plot 两个都在同一轴上,从 21:00 开始,到 08:00 结束。

Here is the code I'm using:这是我正在使用的代码:

import pandas as pd
import plotly.express as px

df1 = pd.DataFrame({'location': 1,
                    'data': ['3', '4', '5'], 
                   'time': [datetime.datetime(2022,7,16,21,0,0).time(), 
                            datetime.datetime(2022,7,16,21,1,0).time(), 
                            datetime.datetime(2022,7,16,21,3,0).time()]})
df2 = pd.DataFrame({'location': 2,
                    'data': ['8', '6', '7'], 
                   'time': [datetime.datetime(2022,7,17,21,0,0).time(), 
                            datetime.datetime(2022,7,17,21,2,0).time(), 
                            datetime.datetime(2022,7,17,21,3,0).time()]})

df = pd.concat([df1,df2], axis=0)

fig = px.line(df, x="time", y="data", color='location')
fig.show()

Thanks!谢谢!

The problem is with the time column.问题在于时间列。 As you convert it to time() , this will be converted to object when you combine the dataframes.当您将其转换为time()时,当您组合数据帧时,它将转换为 object。 Check df.info() .检查df.info() To avoid this, leave the data in datetime format and use update_axis() to let px set the time.为避免这种情况,请将数据保留为日期时间格式并使用update_axis()px设置时间。 Updated code below...下面更新代码...

import pandas as pd
import plotly.express as px

df1 = pd.DataFrame({'location': 1,
                    'data': ['3', '4', '5'], 
                   'time': [datetime.datetime(2022,7,16,21,0,0), 
                            datetime.datetime(2022,7,16,21,1,0), 
                            datetime.datetime(2022,7,16,21,3,0)]})
df2 = pd.DataFrame({'location': 2,
                    'data': ['8', '6', '7'], 
                   'time': [datetime.datetime(2022,7,16,21,0,0), 
                            datetime.datetime(2022,7,16,21,2,0), 
                            datetime.datetime(2022,7,16,21,3,0)]})

df = pd.concat([df1,df2], axis=0)

fig = px.line(df, x="time", y="data", color='location')
fig.update_xaxes(tickformat="%H:%M:%S")
fig.show()

Plot Plot

在此处输入图像描述

Thank you for your help @Redox it was very helpful but unfortunately doesn't work as I want it to when using the full datasets.感谢您的帮助@Redox,它非常有帮助,但不幸的是,在使用完整数据集时,它并没有像我想要的那样工作。 This is the result for the equivalent of this:这是等效的结果:

## Note that you need to use .time()
df1 = pd.DataFrame({'location': 1, 'data': ['3', '4', '5'], 
                   'time': [datetime.datetime(2022,7,17,21,0,0).time(), 
                            datetime.datetime(2022,7,17,21,1,0).time(), 
                            datetime.datetime(2022,7,17,21,3,0).time()]})
df2 = pd.DataFrame({'location': 2, 'data': ['8', '6', '7'], 
                   'time': [datetime.datetime(2022,7,16,21,0,0).time(), 
                            datetime.datetime(2022,7,16,21,2,0).time(), 
                            datetime.datetime(2022,7,16,21,3,0).time()]})

df = pd.concat([df1,df2], axis=0)
date = str(datetime.datetime.strptime('2022-01-01', '%Y-%m-%d').date())  ##Random dummy date
df['time'] = pd.to_datetime(date + " " + df['time'].astype(str)) ##Convert back to datetime
fig = px.line(df, x="time", y="data", color='location')
fig.update_xaxes(tickformat="%H:%M")
fig.show()

在此处输入图像描述

When I try this:当我尝试这个时:

dt = datetime.datetime.strptime('2022-01-01', '%Y-%m-%d')
starttime = dt.replace(hour=21, minute=0) ## Start time is 9PM
dt = datetime.datetime.strptime('2022-01-02', '%Y-%m-%d')
endtime = dt.replace(hour=8, minute=0) ## End time is 8AM next day
fig = px.line(df, x="time", y="data", color='location', range_x=[starttime, endtime])

This is the result:这是结果: 在此处输入图像描述

Here is what worked for me eventually:以下是最终对我有用的方法:

df1 = pd.DataFrame({'location': 1, 'data': ['3', '4', '5'], 
                       'time_num': [datetime.datetime(2022,7,17,21,0,0).time().hour + datetime.datetime(2022,7,17,21,0,0).time().minute/60, 
                                datetime.datetime(2022,7,17,21,1,0).time().hour + datetime.datetime(2022,7,17,21,0,0).time().minute/60,  
                                datetime.datetime(2022,7,17,21,3,0).time().hour + datetime.datetime(2022,7,17,21,0,0).time().minute/60, ]})
df2 = pd.DataFrame({'location': 2, 'data': ['8', '6', '7'], 
                       'time_num': [datetime.datetime(2022,7,16,21,0,0).time().hour + datetime.datetime(2022,7,16,21,0,0).time().minute/60, 
                                datetime.datetime(2022,7,16,21,2,0).time().hour + datetime.datetime(2022,7,16,21,2,0).time().minute/60, 
                                datetime.datetime(2022,7,16,21,3,0).time().hour + datetime.datetime(2022,7,16,21,3,0).time().minute/60]})
    
df_skeleton = pd.DataFrame()
df_skeleton['date'] = pd.date_range(datetime.datetime(2022,7,16,20,0,0), datetime.datetime(2022,7,17,8,0,0), freq = '1min')
df_skeleton['time']=df_test['date'].dt.strftime('%H:%M:%S')
df_skeleton['hour']=df_test['date'].dt.strftime('%H')
df_skeleton['min']=df_test['date'].dt.strftime('%M')
df_skeleton[['hour', 'min']] = df_test[['hour', 'min']].astype(int)
df_skeleton['time_num'] = df_test['hour'] + df_test['min']/60

result_1 = pd.merge(df_skeleton, df1, how="left", on=["time_num", "time_num"])
result_2 = pd.merge(df_skeleton, df2, how="left", on=["time_num", "time_num"])
result_1['location'] = '1'
fig = px.line(result_1, x='time', y='data',color='location')
fig.add_scatter(x=result_2['time'], y=result_2['data'],mode='lines', name='2')
fig.update_traces(connectgaps=True)
fig.show()

I'm not overly pleased with it but it works both with the dummy dataframes and the full dataframes.我对它并不太满意,但它适用于虚拟数据帧和完整数据帧。 在此处输入图像描述 在此处输入图像描述

  • started by simulating data that has the features you describe.首先模拟具有您描述的特征的数据。 From 21:00 to 08:00 on different dates and with different randomly removed minutes从 21:00 到 08:00 在不同的日期和不同的随机删除的分钟
  • now integrate this data.现在整合这些数据。 Have taken approach已采取措施
    1. fill missing minutes by outer join to all minutes in each dataframe通过外连接填充缺失的分钟数到每个 dataframe 中的所有分钟数
    2. outer join the two data frames on time only按时外连接两个数据帧

This gives a different struct data frame:这给出了一个不同的结构数据框:

location_x位置_x time_x时间_x Data_x数据_x t location_y location_y time_y time_y Data_y数据_y
0 0 1 1 2022-09-01 21:00:00 2022-09-01 21:00:00 0 0 21:00:00 21:00:00 2 2 2022-09-04 21:00:00 2022-09-04 21:00:00 1 1
1 1 1 1 2022-09-01 21:01:00 2022-09-01 21:01:00 0.0302984 0.0302984 21:01:00 21:01:00 2 2 2022-09-04 21:01:00 2022-09-04 21:01:00 0.999541 0.999541
2 2 1 1 2022-09-01 21:02:00 2022-09-01 21:02:00 0.060569 0.060569 21:02:00 21:02:00 2 2 2022-09-04 21:02:00 2022-09-04 21:02:00 0.998164 0.998164
3 3 1 1 2022-09-01 21:03:00 2022-09-01 21:03:00 0.0907839 0.0907839 21:03:00 21:03:00 2 2 2022-09-04 21:03:00 2022-09-04 21:03:00 0.995871 0.995871
4 4 1 1 2022-09-01 21:04:00 2022-09-01 21:04:00 0.120916 0.120916 21:04:00 21:04:00 2 2 2022-09-04 21:04:00 2022-09-04 21:04:00 nan

This is then simple to generate a px.line() figure from.然后很容易从中生成一个px.line()图形。 Traces being Data_x and Data_y .跟踪是Data_xData_y Have used datetime column time_x for xaxis .已将datetimetime_x用于xaxis This then works well as datetime and continuous axes are well integrated.这样就可以很好地集成日期时间和连续轴。 Updated tickformat so date part of axis is not displayed.更新tickformat ,因此轴的日期部分不显示。

import pandas as pd
import numpy as np
import plotly.express as px

dr = pd.date_range("2022-09-01 21:00", "2022-09-02 08:00", freq="1Min")

# data to match question, two dataframes from 21:00 to 08:00, different dates with some holes
# with different dates
dfs = [
    pd.DataFrame(
        {
            "location": np.full(len(dr), l),
            "time": dr + pd.DateOffset(days=o),
            "Data": f(np.linspace(0, 20, len(dr))),
        }
    )
    .sample(frac=0.95)
    .sort_index()
    for l, o, f in zip([1, 2], [0, 3], [np.sin, np.cos])
]


df1 = dfs[0]
df2 = dfs[1]

# let's integrate the dataframes
# 1. fill the holes in each dataframe by doing an outer join to all times
# 2. outer join the two dataframes on just the time
df = pd.merge(
    *[
        pd.merge(
            d,
            pd.DataFrame(
                {"time": pd.date_range(d["time"].min(), d["time"].max(), freq="1min")}
            ),
            on="time",
            how="outer",
        )
        .fillna({"location": l})
        .assign(t=lambda d: d["time"].dt.time)
        for d, l in zip([df1, df2], [1, 2])
    ],
    on="t",
    how="outer",
)


# finally generate plotly line chart using columns created by merging the data
# it's clearly observed there are gaps in both traces
px.line(
    df.sort_values("time_x"), x="time_x", y=["Data_x", "Data_y"], hover_data=["time_y"]
).update_layout({"xaxis": {"tickformat": "%H:%M"}})

output output

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM