简体   繁体   English

箱线图中没有异常值检测

[英]No outlier detection in boxplot

I would like to plot boxplots of dataframes (see sample code below).我想绘制数据框的箱线图(请参阅下面的示例代码)。 What I'm wondering is: How can I disable the detection of outlier?我想知道的是:如何禁用异常值检测? I don't want to remove them, I just want a plot which visualizes the data by marking 0%, 25%, 50% and 75% of the datapoints without considering any criteria for outliers etc.我不想删除它们,我只想要一个图,通过标记 0%、25%、50% 和 75% 的数据点来可视化数据,而不考虑任何异常值等标准。

How do I have to modify my code to achieve this?我必须如何修改我的代码才能实现这一点? Can I change the outlier detection criteria in a way that it behaves like disabled?我可以更改异常值检测标准,使其行为类似于禁用吗?

I would be very grateful for any help and if there is already another threat about this (which I didn't find), I would be happy to get a link to it.如果您提供任何帮助,我将不胜感激,如果已经有其他威胁(我没有找到),我很乐意获得它的链接。

Many thanks!非常感谢! Jordin约丁

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(1234)
df = pd.DataFrame(np.random.randn(10, 4),
                  columns=['Col1', 'Col2', 'Col3', 'Col4'])

plt.figure()
plt.boxplot(df.values)
plt.show()

EDIT:编辑:

右上角的离群点被标记为离群点

I would like to include this outlier when drawing the whiskers and not just not show it.我想在绘制胡须时包括这个异常值,而不仅仅是不显示它。

If you add sym='' inside your plot function I think you will get what you ask for:如果你在你的 plot 函数中添加sym=''我想你会得到你所要求的:

箱形图

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(1234)
df = pd.DataFrame(np.random.randn(10, 4),
                  columns=['Col1', 'Col2', 'Col3', 'Col4'])

df.boxplot(sym='')

You're looking for the whis parameter.您正在寻找whis参数。

For the documentation :对于文档

whis : float, sequence, or string (default = 1.5) whis :浮点数、序列或字符串(默认值 = 1.5)

As a float, determines the reach of the whiskers to the beyond the first and third quartiles.作为浮点数,确定胡须到达第一和第三四分位数以外的范围。 In other words, where IQR is the interquartile range (Q3-Q1), the upper whisker will extend to last datum less than Q3 + whis IQR).换句话说,当 IQR 是四分位距(Q3-Q1)时,上须将延伸到小于 Q3 + whis IQR)的最后一个数据 Similarly, the lower whisker will extend to the first datum greater than Q1 - whis IQR.类似地,下部晶须将延伸到大于 Q1 的第一个数据 - 即IQR。 Beyond the whiskers, data are considered outliers and are plotted as individual points.在须线之外,数据被视为异常值并绘制为单个点。 Set this to an unreasonably high value to force the whiskers to show the min and max values.将此设置为不合理的高值以强制胡须显示最小值和最大值。 Alternatively, set this to an ascending sequence of percentile (eg, [5, 95]) to set the whiskers at specific percentiles of the data.或者,将其设置为百分位数的升序序列(例如,[5, 95])以将胡须设置为数据的特定百分位数。 Finally, whis can be the string 'range' to force the whiskers to the min and max of the data.最后,whis 可以是字符串 'range' 以将胡须强制为数据的最小值和最大值。

Add it like so:像这样添加它:

df.boxplot(whis=99)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM