簡體   English   中英

使用seaborn.distplot繪制密度圖時,如何防止seaborn平滑直方圖?

[英]how to prevent seaborn from smoothing the histogram when plotting a density plot using seaborn.distplot?

這是我現在面臨的問題。 我正在嘗試使用seaborn.distplot()繪制密度圖(即直方圖的平滑近似seaborn.distplot()並獲得下圖:

密度圖

上圖的問題在於,最左側的輪廓遠超過-1.0,我不希望這樣,因為相似性得分不能小於-1.0(即,它應該位於閉合區間[-1.0, 1.0] )。 我檢查了輸入數組(如下所示)的值是否小於-1.0,沒有這樣的值小於-1.0。 因此,似乎seaborn.distplot()使分布平滑,使其擴展到-1.0以上。 我該如何阻止這種情況的發生? 我嘗試在x軸上設置xlim ,但是在圖的左側沒有留下任何空間(就像我們在+1.0之后在最右側有一個單獨的列)。

為了舉例說明,這是我用於繪圖的示例輸入數組和代碼:

arr = np.array([-0.35416853, -0.28675528, -0.54088942,  0.18797232,  0.01707244,
       -0.48090636, -0.44454523, -0.03228283, -0.70861904,  0.02323842,
       -0.54905541, -0.5421915 ,  0.27547336, -0.92913273, -0.55379011,
       -0.23521681, -0.1079175 , -0.24065031, -0.33773661, -0.06147251,
       -0.74171701, -0.74315048,  0.06634989, -0.49222919,  0.48899574,
        0.13499221,  0.53120786, -0.1688146 ,  0.47125832,  0.36517109,
        0.33110315,  0.34495851,  0.18393   ,  0.67211736,  0.11608325,
       -0.92913273, -0.71209124,  0.01828323,  0.30894561, -0.06463642,
        0.45423401, -0.7993457 ,  0.50007295,  0.17983021, -0.66105515,
       -0.92783269, -0.49277017, -0.19487059,  0.07502782,  0.00700057,
        0.29958942, -0.04223299,  0.04105657, -0.12604522,  0.30506049,
       -0.15600141, -0.17434894,  0.01152945, -0.11583157,  0.07010729,
       -0.92913273, -0.02566766,  0.48114331, -0.13252103, -0.42600686,
        0.54836633,  0.37945642, -0.34006735, -0.29560479,  0.4930249 ,
        0.02693856,  0.57255816,  0.31185216,  0.19780182,  0.11909931,
       -0.02853919, -0.25082142, -0.08635957, -0.28266912, -0.80937364,
       -0.92913273, -0.0172393 , -0.18993503, -0.69080226, -0.66901143,
        0.0470842 , -0.45307088,  0.05043218, -0.20894534, -0.22218531,
        0.5189177 , -0.92913273,  0.31509469, -0.15935917, -0.92913273,
       -0.41652189,  0.20265061,  0.016976  ,  0.0680205 ,  0.33159134,
       -0.3138477 ,  0.10086817,  0.37074665, -0.06916329, -0.19177307,
        0.22842641, -0.15087903,  0.34376167,  0.24173604, -0.38040409,
       -0.20031291,  0.17990511,  0.40231535, -0.27195479, -0.15867829,
        0.2389052 ,  0.08337308, -0.07327617, -0.77566734, -0.12074809,
        0.19539527,  0.03727124, -0.13330546,  0.13602168,  0.36673224,
       -0.3434154 ,  0.19251896,  0.27692974,  0.4757158 ,  0.24333386,
        0.29905657,  0.57319178,  0.46753947, -0.04079389,  0.5571865 ,
        0.3453707 ,  0.55110949,  0.19614831,  0.61707333,  0.3680048 ,
        0.48193126,  0.67330892,  0.53603774,  0.54464057,  0.35016492,
        0.36970268,  0.150395  ,  0.4697073 ,  0.3383952 ,  0.4037419 ,
       -0.01055328,  0.26734498,  0.2647191 ,  0.30056532,  0.46706568,
        0.41460328,  0.42295413,  0.44188908,  0.29304088, -0.18437651,
       -0.33404869,  0.31744862,  0.16578238, -0.2903621 , -0.36128032,
       -0.65571561,  0.39868119, -0.31359498,  0.45377302,  0.23929229,
        0.19958669,  0.51978988, -0.01249307, -0.16404641,  0.27193916,
       -0.11159726, -0.10719093,  0.05472177, -0.64784851,  0.25594644,
       -0.26109644, -0.28908332,  0.06264426,  0.05689891,  0.26437733,
       -0.29424862,  0.26441642,  0.34868516,  0.00497344, -0.46811445,
       -0.35795662, -0.04599685,  0.08701907, -0.32572399,  0.17639076,
        0.35640737, -0.08174591, -0.13910904,  0.35387245,  0.00857055,
       -0.24789401,  0.24033791, -0.08525459,  0.19189512,  0.27148848,
       -0.38631975, -0.08820518,  0.12658585,  0.23404602,  0.06062359,
        0.13340842, -0.11942433, -0.15974527, -0.0236961 ,  0.01533685,
       -0.92641117,  0.01533685, -0.00582898,  0.08251113, -0.18537655,
       -0.92641117, -0.63036561, -0.02408175, -0.10033362, -0.08820518,
        0.01533685, -0.1475904 , -0.06573955, -0.10033362, -0.08820518,
       -0.08820518,  0.04798457,  0.29057868,  0.08310757,  0.25168328,
        0.03989156,  0.1895359 , -0.44324531, -0.16724842,  0.06172038,
        0.05685105,  0.3381661 , -0.46472578, -0.13137012,  0.10249921,
        0.26703853,  0.14798872,  0.09729466, -0.09559039,  0.38893042,
        0.6081168 , -0.32574556, -0.11493626,  0.30370567, -0.13203101,
        0.12251789,  0.29993512, -0.80796771, -0.14717629,  0.37894796,
        0.30086822,  0.26228619, -0.01403568, -0.46596314, -0.11860131,
       -0.52649509,  0.41834337,  0.25892792,  0.40497516, -0.0287142 ,
       -0.14994142,  0.41714702,  0.40928704,  0.0595943 ,  0.5190621 ,
        0.53760238,  0.25452441, -0.08397463,  0.22131469, -0.46173602,
        0.48456617,  0.44220971,  0.16059022,  0.43723123,  0.04680989,
       -0.00131657, -0.09681387, -0.48600167, -0.44205123,  0.13787778,
       -0.02900436,  0.07049823,  0.02565475, -0.20544388,  0.0297263 ,
        0.09162641, -0.17354248, -0.41518963,  0.12393266, -0.41754063,
       -0.19018751,  0.02251257, -0.27799953,  0.21135703,  0.09597453,
        0.56175636,  0.34126265,  0.17056669,  0.13149045, -0.30472518,
       -0.07366951,  0.42843431, -0.22890901,  0.05518269, -0.01007775,
       -0.48123104, -0.44906545,  0.09229373, -0.85684002,  0.23411821,
        0.02637603,  0.02477345,  0.21678001, -0.14454807,  0.32430986,
       -0.12988135,  0.07014938,  0.17991853, -0.02405694, -0.83110188,
       -0.11192697,  0.02312546, -0.10770876,  0.13470276,  0.10568144,
       -0.20336714, -0.15739212,  0.21271663,  0.05357167,  0.3281988 ,
        0.17442453,  0.11561338, -0.68398479, -0.03704769,  0.28698584,
        0.17608064,  0.30424182,  0.51034264, -0.09452418,  0.38242868,
       -0.60014916,  0.21856565, -0.04819684,  0.2653766 ,  0.02992649,
        0.18941891, -0.04752845,  0.02295903, -0.29201727,  0.07913569,
       -0.12563984,  0.21124929, -0.18801383, -0.24118712, -0.29686842,
        0.27609838, -0.23855832,  0.31970457,  0.41328374,  0.19630546,
        0.34077982, -0.3704136 ,  0.17032295,  0.20643397,  0.34154881,
        0.1504677 ,  0.37392242,  0.25842101, -0.50553798,  0.35387764,
        0.41873554,  0.27067669,  0.31011181, -0.51092977, -0.10282291,
       -0.4126883 , -0.52383119, -0.82821877, -0.4585979 ,  0.2531493 ,
        0.34361492,  0.38418371, -0.22988404,  0.285816  , -0.40203361,
        0.38114577,  0.15781548,  0.27335741,  0.36371593,  0.36515941])

In [57]: ax = sns.distplot(arr, hist=False, kde_kws={"shade": True}, norm_hist=True, label="density plot")

In [58]: plt.plot(np.array([-0.208, -0.208]), np.array([0, 2]), color='grey', linestyle='--')
In [59]: plt.plot(np.array([0.317, 0.317]), np.array([0, 2]), color='grey', linestyle='--')    
In [60]: ax.set_xlabel(r"similarity")
In [61]: ax.set_ylabel(r"density")
In [62]: plt.show()

因此,我不想在圖的左側進行這種平滑處理,而在圖的最右側保留一列的間距。 我該如何實現? 謝謝!

獲得理想結果的一種方法是使用自定義窗口和內核。 內核和窗口都應取決於窗口中心相對於間隔的邊緣點ab

Searborn使用stats.gaussian_kde或kde estimatior(如果已安裝)。 據我所知gaussian_kde ,它不允許這種調整。 因此,我們需要實現自定義kde估計器。

請看下面的代碼片段,它可以正常工作,可以視為進一步改進的起點。

import numpy as np
from scipy.integrate import quad

class kde:

    def __init__(self, a, b, kernel=None):
        self.a = a
        self.b = b

    def h(self, x):
        """ h(x) window size depends on position of the center of the window relative to (a, b).

        _ r    ___________   <-- rthumb = r = height of the graph
        |     /           \ 
        |    /             \
        |---|--|----------|-|----- 
            a  a+r      b-r b

        """

        if x > (self.a + self.rthumb) and x < (self.b - self.rthumb):
            return self.rthumb
        elif x >= self.a and (x < self.a + self.rthumb):
            return x - self.a + np.finfo(float).eps
        elif (x <= self.b) and (x > self.b - self.rthumb):
            return self.b - x + np.finfo(float).eps
        else:
            return + np.finfo(float).eps

    def kernel(self, x):
        return np.exp(-0.5 * x ** 2) / np.sqrt(2 * np.pi);

    def window(self, x):
        """ x - scalar value """
        def w(y):
            res = self.kernel((x - y) / self.h(x))  # gaussian kernel
            res[(y > self.b) | (y < self.a)] = 0.0  # window is zero outside [a, b]~
            return res
        return w

    def fit(self, data):
        # Rule of thumb
        self.rthumb = 1.06 * np.std(data) * np.power(len(data), -1/5) 
        def _pdf(x):
            ww = self.window(x)
            return ww(data).sum() / len(data)
        val = quad(_pdf, self.a, self.b)[0]
        def pdf_norm(f):
            def pn(x):
                return f(x) / val
            return pn
        self.pdf = np.vectorize(pdf_norm(_pdf))
        return self

如果我們將其應用於您的數據:

k = kde(-1, 1)
from pylab import plt
x = np.linspace(-1,  1, 100)
plt.plot(x, k.fit(arr).pdf(x))

我們得到:

在此處輸入圖片說明

如果滑動窗口靠近間隔的邊緣,則會被截斷:

plt.plot(x, k.window(0.9)(x), 'r.', x, k.window(0)(x), x, k.window(-.9)(x),'r.')
plt.show()

在此處輸入圖片說明

請注意,此自定義類產生標准化的pdf估計值,例如AUC(kde.pdf)= 1。

編輯:

我在h(x)值上添加了小值(浮點型epsilon 1),現在一切正常,沒有警告。

通常,在理論pdf是平滑函數的假設下,嘗試獲得核密度估計。 在您的情況下,您可以截斷使用scipy中使用gaussian_kde獲得的pdf,最后在截斷估計中添加一些常數以滿足AUC =1。某些分布具有第一種不連續性,例如均勻分布的pdf。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM