简体   繁体   中英

5-D Kernel density estimation in R using “kde” function

I want to perform Kernel density estimate for a 5-dimensional data (x,y,z,time,size) by using "kde" function in "ks" library of R. In it's manual it says it can do Kernel density estimate for 1- to 6-dimensional data (Page 24 of manual: http://cran.r-project.org/web/packages/ks/ks.pdf ).

My problem is that it says for more than 3 dimensions I need to specify eval.points. I don't know how can I specify the evaluation points because there is no example for more than 3 dimensions. For example if I want to Generate regular 3D sequences data in the space of the problem and use them as the eval-point, what should I do?
Here is my data:

422.697323  164.19886   2.457419    8.083796636  0.83367586
423.008236  163.32434   0.5551326   37.58477455  0.893893903
204.733908  218.36365   1.9397874   37.88324312  0.912809449
203.963056  218.4808    0.3723791   43.21775903  0.926406005
100.727581  46.60876    1.4022341   49.41510519  0.782807523
453.335182  244.25521   1.6292517   51.73779175  0.903910803
134.909462  210.96333   2.2389119   53.13433521  0.896529401
135.300562  212.02055   0.6739541   67.55073745  0.748783521
258.237117  134.29735   2.1205291   76.34032587  0.735699304
341.305271  149.26953   3.718958    94.33975483  0.849509216
307.138925  59.60571    0.6311074   106.9636715  0.987923188
307.76875   58.91453    2.6496741   113.8515307  0.802115718
415.025535  217.17398   1.7155688   115.7464603  0.875580325
414.977687  216.73327   1.7107369   115.9776948  0.767143582
311.006135  173.24378   2.7819572   120.8079566  0.925380118
310.116929  174.28122   4.3318722   129.2648401  0.776528535
347.260911  37.34946    3.5155427   136.7851291  0.851787115
351.317624  33.65703    0.5806926   138.7349284  0.909723017
4.471892    59.42068    1.4062959   139.0543783  0.967270976
5.480223    59.72857    2.7326106   139.2114277  0.987787428
199.513023  21.53302    2.5163259   143.5895625  0.864164659
198.718031  23.50163    0.4801849   147.2280466  0.741587333
26.650517   35.2019     0.8246514   150.4876506  0.744788202
25.089379   90.47825    0.8700944   152.1944046  0.777252476
26.307439   88.41552    2.4422487   155.9090026  0.952215177
234.282901  236.11422   1.8115261   155.9658144  0.776284654
235.052948  236.77437   1.9644963   156.6900297  0.944285448
23.048202   98.6261     3.4573048   159.7700912  0.773057491
21.516695   98.05431    2.5029284   160.8202997  0.978779087
213.936324  151.87013   3.1042192   161.0612489  0.80499513
277.887935  197.25753   1.3659279   163.673142   0.758978575
277.239746  197.54001   2.2109361   166.2629868  0.775325157

And this is the code that I am using:

library(ks) 
library(rgl)
kern <- read.table(file.choose(), sep=",")
hat <- kde(kern)

It works for upto 3 dimensions but for 4 and 5 dimensions it says: need to specify eval.points for more than 3 dimensions.

Also, I'd like to know how can I plot these kernels? For example use z as the conditioning variable and plot x,y,time in a 3D scatterplot and also use different colors for different ranges of size

Like you I wasn't initially able to find a worked example and the documentation doesn't really describe what sort of object is expected. For your 5d set of data I tried setting up a 5d-grid of points that were constructed from the 10, 25th, 50th, 75th and 90th percentiles for each of the dimensions. My dataset was named "dat":

evpts <- do.call(expand.grid,  lapply(dat, quantile, prob=c(0.1,.25,.5,.75,.9)) )

I then passed that to the kde function and seemed to satisfy the algorithm. Whether this is "correct" does need checking. No guarantees.

> hat <- kde(dat, eval.points= evpts)
> str(hat)
List of 8
 $ x          : num [1:31, 1:5] 423 423 205 204 101 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:5] "V1" "V2" "V3" "V4" ...
 $ eval.points:'data.frame':    3125 obs. of  5 variables:
  ..$ V1: Named num [1:3125] 23 118 234 326 415 ...
  .. ..- attr(*, "names")= chr [1:3125] "10%" "25%" "50%" "75%" ...
  ..$ V2: Named num [1:3125] 35.2 35.2 35.2 35.2 35.2 ...
  .. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
  ..$ V3: Named num [1:3125] 0.581 0.581 0.581 0.581 0.581 ...
  .. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
  ..$ V4: Named num [1:3125] 43.2 43.2 43.2 43.2 43.2 ...
  .. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
  ..$ V5: Named num [1:3125] 0.749 0.749 0.749 0.749 0.749 ...
  .. ..- attr(*, "names")= chr [1:3125] "10%" "10%" "10%" "10%" ...
  ..- attr(*, "out.attrs")=List of 2
  .. ..$ dim     : Named int [1:5] 5 5 5 5 5
  .. .. ..- attr(*, "names")= chr [1:5] "V1" "V2" "V3" "V4" ...
  .. ..$ dimnames:List of 5
  .. .. ..$ V1: chr [1:5] "V1= 23.0482" "V1=117.8185" "V1=234.2829" "V1=326.1557" ...
  .. .. ..$ V2: chr [1:5] "V2= 35.20190" "V2= 59.51319" "V2=149.26953" "V2=211.49194" ...
  .. .. ..$ V3: chr [1:5] "V3=0.5806926" "V3=1.1180112" "V3=1.9397874" "V3=2.5830000" ...
  .. .. ..$ V4: chr [1:5] "V4= 43.21776" "V4= 71.94553" "V4=129.26484" "V4=151.34103" ...
  .. .. ..$ V5: chr [1:5] "V5=0.7487835" "V5=0.7764066" "V5=0.8517871" "V5=0.9190948" ...
 $ estimate   : Named num [1:3125] 3.23e-08 5.70e-08 1.01e-08 4.07e-10 6.20e-12 ...
  ..- attr(*, "names")= chr [1:3125] "1" "2" "3" "4" ...
 $ H          : num [1:5, 1:5] 5073.879 1010.815 1.211 -651.089 -0.223 ...
 $ gridded    : logi FALSE
 $ binned     : logi FALSE
 $ names      : chr [1:5] "V1" "V2" "V3" "V4" ...
 $ w          : num [1:31] 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "class")= chr "kde"

I did find an earlier version of the package documentaion that offered this as a worked example of a 4d execution, sot I think my effort is essentially the same, modulo different dimensions:

data(iris)
   ir <- iris[,1:4][iris[,5]=="setosa",]
   H.scv <- Hscv(ir)
   fhat <- kde(ir, H.scv, eval.points=ir)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM