简体   繁体   English

如何在同一个图上对多个字典进行箱线图

[英]How to boxplot multiple dictionaries on the same plot

For a change point detection task, I am testing my own algorithm against a baseline, and I would like to plot the results from the two algorithms on the same boxplot.对于变化点检测任务,我正在根据基线测试我自己的算法,并且我想将两种算法的结果绘制在同一个箱线图上。

My results (F Score values) are stored in a dictionary where the keys are two parameters a and b (both with 4 distinct values):我的结果(F 分数值)存储在字典中,其中键是两个参数ab (都有 4 个不同的值):

resultsOwnAlgorithm = {'a1, b1': [0.8, 0.7, 0.6, ...], 'a1, b2': [...], ..., 'a2, b1': [...], ...}
resultsBaseline = {'a1, b1': [0.7, 0.6, ...], 'a1, b2': [...], ..., 'a2, b1': [...], ...}

For now, I have a function to plot them individually.现在,我有一个单独绘制它们的功能。 I create 4 subplots where a is set and b is changing, see image (values are random, just to create an example image).我创建了 4 个子图,其中设置a并且b正在更改,请参见图像(值是随机的,只是为了创建示例图像)。 The function looks like this:该函数如下所示:

def plotResults(results, keys, test):
    
    fig, axs = plt.subplots(2,2,figsize=(10,10))
    for ax in axs.flat:
        ax.set_ylim(0,1)
        ax.set_xticks(range(len(abrs)))
        ax.set_xticklabels(abrs)
    
    count = 0
    for i in (0,1):
        for j in (0,1):
            axs[i,j].set_title(str(test) + ', mean shift: ' + str(keys[count][0][0:2]).strip('x,') + ', iters=' + str(iterations), fontweight ="bold")
            l = keys[count]
            k = {k:results[k] for k in l if k in results}
            label, data = k.keys(), k.values()
            axs[i,j].boxplot(data,showfliers=False,patch_artist=True)
            axs[i,j].set_xticks(range(1, len(label) + 1))
            axs[i,j].set_xticklabels(label)
            count+=1

where results is either resultsOwnAlgorithm or resultsBaseline , keys is the dicitonary keys, so the different combinations of a and b , and test is just used to put which algorithm is being plotted in the title.其中resultsresultsOwnAlgorithmresultsBaselinekeys是字典键,因此ab的不同组合和test仅用于将正在绘制的算法放在标题中。

My question is: how do I plot them side by side on the same plot?我的问题是:我如何在同一个情节上并排绘制它们?

看图片

There's a few errors in your plotting function, so I could get it to work without making great assumptions, like what abrs is and what iterations is.您的绘图功能中有一些错误,所以我可以在不做很大假设的情况下让它工作,比如abrs是什么以及iterations是什么。 You should fix them before continuing with your work as this function is getting them likely from the global scope (assuming a jupyter notebook) and that will lead to bugs later on, as I've painfully experienced before.你应该在继续你的工作之前修复它们,因为这个函数很可能从全局范围(假设是一个 jupyter notebook)中获取它们,这导致稍后出现错误,正如我以前痛苦地经历过的那样。

Anyway, your problem can be tackled first by adapting your code to use seaborn.无论如何,您的问题可以首先通过调整您的代码以使用 seaborn 来解决。 Check the example here , "Draw a boxplot with nested grouping by two categorical variables" .检查此处的示例, “通过两个分类变量绘制具有嵌套分组的箱线图”

The method that can be more easily modified to fit your usecase is this: Generate a set of x values that will be associated with each boxplot group.可以更轻松地修改以适合您的用例的方法是:生成一组将与每个箱线图组关联的x值。 Then, add a shift to the left or right depending on where you want to place this boxplot.然后,根据您要放置此箱线图的位置,向左或向右添加偏移。 Then you have to fix the ticks and so on, but you already know how to do that.然后你必须修复蜱虫等等,但你已经知道如何去做了。 Here's an example that maintains as much as possible of your structure.这是一个尽可能多地维护您的结构的示例。

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(42)
resultsOwnAlgorithm = {'a1, b1': np.random.normal(scale=2, size=20), 'a2, b2': np.random.normal(scale=1.5, size=20)}
resultsBaseline = {'a1, b1': np.random.normal(scale=2, size=20), 'a2, b2': np.random.normal(scale=1.5, size=20)}
x_vals = np.arange(0, len(resultsOwnAlgorithm))
xs = {key:val for key, val in zip(resultsOwnAlgorithm.keys(), x_vals)}
shift = 0.1

fig, ax = plt.subplots()
for key in resultsOwnAlgorithm.keys():
    ax.boxplot(resultsOwnAlgorithm[key], positions=[xs[key] - shift], boxprops=dict(color='r'))
    ax.boxplot(resultsBaseline[key], positions=[xs[key] + shift], boxprops=dict(color='b'))

ax.set_xticks(x_vals)
ax.set_xticklabels(resultsOwnAlgorithm.keys())

This yields the following graph:这会产生以下图表: 在此处输入图像描述

  • The easiest solution is probably to combine all of the dictionaries into a single pands.DataFrame .最简单的解决方案可能是将所有字典组合成一个pands.DataFrame This will make the data easy to analyze, and plot.这将使数据易于分析和绘图。
    1. Iterate through a list of dictionaries, zipped to a string that will be used to identify where the data came from.遍历字典列表,压缩成一个字符串,用于识别数据的来源。
    2. Create the dataframe.创建数据框。
    3. Add a new column to identify the data.添加一个新列来标识数据。
    4. Append the dataframe to a list.将数据框附加到列表中。
    5. Combine the list of DataFrames withpd.concat , and reset the index.将 DataFrame 列表与pd.concat结合,并重置索引。
    6. Reshape the DataFrame into a long form with pd.DataFrame.melt使用pd.DataFrame.melt将 DataFrame 重塑为长形
  • Seaborn is a high-level api for matplotlib, and easily plots long form data and separates the groups by the hue parameter. Seaborn是 matplotlib 的高级 api,可以轻松绘制长格式数据并通过hue参数分隔组。
  • Tested in python 3.10 , pandas 1.4.2 , matplotlib 3.5.1 , seaborn 0.11.2python 3.10pandas 1.4.2matplotlib 3.5.1seaborn 0.11.2
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

# create sample dictionaries
np.random.seed(2022)
custom = {f'a{v}, b{v}': np.random.normal(scale=v, size=100) for v in range(1, 5)}
baseline = {f'a{i}, b{i}': np.random.normal(scale=v, size=100) for i, v in enumerate(np.arange(1.5, 5.5), 1)}

# create and shape dataframe
dfs = list()
for d, _id in zip([resultsBaseline, resultsOwnAlgorithm], ['baseline', 'custom']):
    df = pd.DataFrame(d)
    df['Algorithm'] = _id
    dfs.append(df)
dfs = pd.concat(dfs).reset_index(drop=True)
dfm = dfs.melt(id_vars='Algorithm', var_name='Parameters', value_name='Score')

# plot
g = sns.catplot(kind='box', data=dfm, x='Parameters', y='Score', hue='Algorithm', height=6, aspect=2)
plt.show

在此处输入图像描述

DataFrame Views数据框视图

  • dfs.head()
     a1, b1    a2, b2    a3, b3    a4, b4 Algorithm
0  0.834463 -1.092923  4.875117 -4.946214  baseline
1  1.338891  0.225008 -0.305499  0.570333  baseline
2  0.261615  2.128844  2.194177  0.494803  baseline
3  0.273740 -2.395624 -3.495572  0.006312  baseline
4 -0.997368  0.984808 -3.956302  0.206667  baseline
  • dfm.head()
  Algorithm Parameters     Score
0  baseline     a1, b1  0.834463
1  baseline     a1, b1  1.338891
2  baseline     a1, b1  0.261615
3  baseline     a1, b1  0.273740
4  baseline     a1, b1 -0.997368

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM