简体   繁体   中英

Plotting ECDF using R with a histogram data

I have histogram data of the form

Key  |  #occurences_of_key
--------------------------
 -10 | 1200
   0 | 1000
  10 | 700
  33 | 500
  67 | 200
  89 | 134
--------------------------

Code to make it:

structure(c(-10, 0, 10, 33, 67, 89, 1200, 1000, 700, 500, 200, 134), .Dim = c(6L, 2L))

I want to plot an Empirical Cumulative Distribution Chart (percentile chart) using R with this data. I am new to R , so I appreciate any pointers. I read about the ecdf function available in R but it is hard for me to follow.

One way I can think of would be to use rep to reconstruct the original data and use ecdf on that.

mat <- structure(c(-10, 0, 10, 33, 67, 89, 1200, 1000, 700, 500, 200, 134), .Dim = c(6L, 2L))

original <- unlist(apply(mat, 1, function(x) rep(x[1], x[2])))

original_ecdf <- ecdf(original)

plot(original_ecdf)

在此处输入图片说明

If your data is huge (and that's why you pre-tabulated it before loading to R), you don't want to generate some 'dummy' data again. You can hack the implementation of ecdf to accept tabulated data:

tab_ecdf <- function (xs, counts) 
{
  n <- sum(counts)
  if (n < 1) 
    stop("'x' must have 1 or more non-missing values")
  rval <- approxfun(xs, cumsum(counts) / n, 
                    method = "constant", yleft = 0, yright = 1, f = 0, ties = "ordered")
  class(rval) <- c("ecdf", "stepfun", class(rval))
  assign("nobs", n, envir = environment(rval))
  attr(rval, "call") <- sys.call()
  rval
}

And then use it instead of the original ecdf() function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM