简体   繁体   中英

Inconsistent skewness results between basic skewness formula, Python and R

The data I'm using is pasted below. When I apply the basic formula for skewness to my data in R:

3*(mean(data) - median(data))/sd(data) 

The result is -0.07949198. I get a very similar result in Python. The median is therefore greater than the mean suggesting the left tail is longer.

However, when I apply the descdist function from the fitdistrplus package , the skewness is 0.3076471 suggesting the right tail is longer. The Scipy function skew again returns a skewness of 0.303.

Can I trust this simple formula which gives me a negative skewness? What is going on here.

Thanks, Oliver

data = c(0.18941565600882029, 1.9861271676300578, -5.2022598870056491, 1.6826411075612353, 1.6826411075612353, -2.9502890173410403, -2.923253150057274, -2.9778296382730454, 0.71202396234488663, 0.71202396234488663, -3.1281373844121529, 1.8326831382748159, -5.2961554710604135, 2.7793190416141234, 0.46922759190417185, 7.0730158730158728, 1.1745152354570636, 2.8142292490118579, 2.037940379403794, 7.0607489597780866, 10.460258249641321, 11.894978479196554, 4.8334682860998655, 1.3884016973125886, 4.0940458015267174, 0.12592959841348539, -0.37022332506203476, 1.9713554987212274, -0.83774145616641893, -1.896978417266187, 6.4340675477239362, -6.4774193548387089, -0.31790393013100438, -4.4193265007320646, 5.7454545454545451, 2.5913432835820895, 0.86190724335591451, 0.95753781950965045, 6.8923556942277697, 1.7650659630606862, -2.4558421851289833, -2.390546528803545, 2.6355029585798815, 0.26983655274888557, 1.5032159264931086, 3.9839506172839503, -5.1404511278195484, -2.2477777777777779, 6.0604444444444443, -0.9691172451489477, 1.1383462670591382, -1.5281319661168078, 4.7775667118950702, 1.2223175965665234, 2.0563555555555553, -3.6153201970443352, -0.35731206188058978, -3.6265094676670238, 1.3053804930332262, -4.4604960677555958, -0.8933514246947083, 0.7622542595019659, 1.3892170651664322, 2.5725258493353031, -0.028006088280060883, 0.8933947772657449, 2.4907086614173228, 3.0914196567862717, 4.4222575516693157, 0.64568527918781726, 0.97095158597662778, -3.7409780775716697, -3.3472636815920396, -0.66307448494453247, -7.0384291725105186, -0.14540612516644474, -0.38161535029004906, 5.1076923076923082, 4.0237516869095806, 1.510099573257468, 1.5064083457526081, -0.025879043600562587, 4.5001414427156998, 3.2326264274061991, 1.0185639229422065, 2.66690518783542, 0.53032015065913374, 1.2117829457364342, 0.60861244019138749, -2.5248049921996878, 1.8666666666666669, -0.32978612415232139, 0.29055999999999998, 1.9150729335494328, 2.2988352745424296, 3.779225265235628, 0.093884800811976657, 1.0097869890616005, 1.2220632081097198, 0.21164401128494487)

I don't have access to the packages you mention right now so I can't check which formula they apply, however, you seem to be using Pearson's second skewness coefficient (see wikipedia ). The estimator for the sample skewness is given on the same page and is given by the third moment which can be calculated simply by:

> S <- mean((data-mean(data))^3)/sd(data)^3
> S
[1] 0.2984792
> n <- length(data)
> S_alt <- S*n^2/((n-1)*(n-2))
> S_alt
[1] 0.3076471

See the alternative definition on the wiki page which yields the same results as in your example.

The skewness is generally defined as the third central moment (at least when it is being used by statisticians.) The Wikipedia skewness page explains why the definition you found is unreliable. (I had never seen that definition.) The code in descdist is easy to review:

moment <- function(data, k) {
        m1 <- mean(data)        # so this is a "central moment"
        return(sum((data - m1)^k)/length(data))
    }
skewness <- function(data) {
            sd <- sqrt(moment(data, 2))
            return(moment(data, 3)/sd^3)}
skewness(data)
#[1] 0.3030131

The version you use is apparently called 'median skewness' or 'non-parametric skewness'. See: https://stats.stackexchange.com/questions/159098/taming-of-the-skew-why-are-there-so-many-skew-functions

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM