简体   繁体   English

Plotly 使用 groupby 的置信区间折线图

[英]Plotly line chart with confidence interval using groupby

I'd like to plot time series simulation data as a mean with confidence intervals and compare multiple scenarios.我想将 plot 时间序列模拟数据作为具有置信区间的均值并比较多个场景。 Using the pandas groupby() and agg() functions a calculate the mean and confidence interval (upper and lower limit) see sample data (the actual data can be retrieved from my github .使用 pandas groupby()agg()函数计算均值和置信区间(上限和下限)查看示例数据(实际数据可以从我的 github中检索。

Edit: I added the raw data to the and the code (as a jupyter notebook) to the git编辑:我将原始数据添加到 git 并将代码(作为jupyter笔记本)添加到

Plotting this data for one specific parameter combination (selecting data via df = df.loc[(slice(None),1, True)] ) seams simple enough:为一个特定参数组合绘制此数据(通过df = df.loc[(slice(None),1, True)] )接缝足够简单:

myFig = go.Figure([
go.Scatter(
    name='Mittelwert',
    #x=df['tick'],
    y=df['mean'],
    mode='lines',
    line=dict(color='rgb(31, 119, 255)'),
),
go.Scatter(
    name='Konfidenzintervall',
    #x=df['tick'],
    y=df['ci95_hi'],
    mode='lines',
    marker=dict(color="#644"),
    line=dict(width=0),
    showlegend=True
),
go.Scatter(
    name='Konfidenzintervall',
    #x=df['tick'],
    y=df['ci95_lo'],
    marker=dict(color="#448"),
    line=dict(width=0),
    mode='lines',
    fillcolor='rgba(130, 68, 68, 0.5)',
    fill='tonexty',
    showlegend=True
)
])
myFig.update_layout(
    xaxis_title='X Achse',
    yaxis_title='Y Achse',
    title='Continuous, variable value error bars',
    hovermode="x"
    )
myFig.show()

This code gives me the that beautiful plot. The issue is a do not know how to properly plot the grouped data.这段代码给了我那个漂亮的 plot。问题是不知道如何正确分组数据 plot。 When I don't select a subset all data is plotted at once.当我不 select 一个子集时,所有数据都会立即绘制。 Therefore i tried to use color , facet_col and facet_row which I could get working using因此,我尝试使用colorfacet_colfacet_row ,我可以使用它们

px.line(reset_df, x = "tick", y="mean", color="first_factor",facet_col="second_factor")

(Since plotly apparently can't handle MultiIndex Dataframe i used reset_index() to get a DataFrame with a 'RangeIndex' first). (由于 plotly 显然无法处理 MultiIndex Dataframe 我使用reset_index()首先获得带有“RangeIndex”的 DataFrame)。 The issue is with the latter approach I'm now missing the confidence interval and don't know how to add it ( c.f. this plot ).问题在于后一种方法我现在缺少置信区间并且不知道如何添加它( c.f. 这个 plot )。

How can I have both the ci and the grouped data within one plot?如何在一个 plot 中同时拥有 ci 和分组数据? If this is not possible with pandas is it with bokeh?如果这对于 pandas 是不可能的,那么它是散景吗? Thank you very much非常感谢

( this is work in progress ) (这是正在进行的工作)


It's still not 100% clear to me how you'd like to display your data here.我仍然不是 100% 清楚您想如何在此处显示您的数据。 In your code sample:在您的代码示例中:

px.line(reset_df, x = "tick", y="mean", color="first_factor",facet_col="second_factor")

... you're using first_factor and second_factor without saying what those are. ...您正在使用first_factorsecond_factor而没有说明它们是什么。 And judging by the column names in your dataframe it could be almost any of:从 dataframe 中的列名来看,它几乎可以是以下任何一个:

['mean', 'median', 'count', 'std', 'min', 'max', 'ci95_lo', 'ci95_hi']

..or your index: ..或者你的指数:

'(1.0, 0.0, False)'

But since it seems clear that you'd like to display a confidence interval around a mean across some categories, I'm guessing that this is what you're looking for:但是由于很明显您想显示某些类别的均值周围的置信区间,我猜这就是您要寻找的:

Plot 1: Plot 1:

在此处输入图像描述

Plot 2: Zoomed in on first plot Plot 2:先放大 plot

在此处输入图像描述

The way to get there is full of pit-falls, and I'm not going to waste anyone's time talking about the details if this is not what you're looking for.到达那里的方式充满了陷阱,如果这不是您想要的,我不会浪费任何人的时间来谈论细节。 But if it is , I'd be happy to explain everything.但如果,我很乐意解释一切。

Complete code:完整代码:

df = pd.read_json("https://raw.githubusercontent.com/thefeinkoster/plotly_issue/main/aggregatedData.json")
df = df.reset_index()
df = pd.concat([df, df['index'].str.split(',', expand = True)], axis = 1)
df = df.rename(columns = {0:'ix', 1:'scenario', 'ci95_lo':'lo', 'ci95_hi':'hi'})

df.ix = [int(float(i[1:])) for i in df.ix]
df = df[df[2]==' True)']

dfp=pd.melt(df, id_vars=['ix', 'scenario'], value_vars=df.columns[-12:])
dfp = dfp[dfp['variable'].isin(['ix', 'mean', 'lo', 'hi'])] 

dfp = dfp.sort_values(['ix', 'variable'], ascending = False)

fig = px.line(dfp, x = 'ix', y = 'value', color = 'variable', facet_row = 'scenario')
fig.update_layout(height = 800)

fig.update_traces(name = 'interval', selector = dict(name = 'hi'))
fig.update_traces(fill = 'tonexty')
fig.update_traces(fillcolor = 'rgba(0,0,0,0)', selector = dict(name = 'mean'))
fig.update_traces(fillcolor = 'rgba(0,0,0,0)', line_color = 'rgba(44, 160, 44, 0.5)',
                  showlegend = False, selector = dict(name = 'lo'))

fig.update_yaxes(range=[0.44, 0.54])
f = fig.full_figure_for_development(warn=False)
fig.show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM