简体   繁体   中英

multivariate density calculations in R

I have a data frame of many numeric variables. Is there a way of calculating (not plotting) areas of the global density which are less dense than others? In other words, is there a way of locating areas of the hyperspace which are very sparsely populated with data points?

Assuming that your dataframe looks like this

df <- data.frame(x = c(rnorm(100,0,3),rnorm(100,12,1),rnorm(100,20,3)), 
                 y = c(rnorm(75,5,2),rnorm(75,-5,3),rnorm(140,10,2),rnorm(10,25,10)))

You can store each density in a vector

dsx <- density(df$x)
dsy <- density(df$y)

Now look at the result of dsx for instance. You will see that we get a list which contains:

  • dsx$x coordinates where density is evaluated

  • dsx$y the estimated density at those coordinates

If you want to find coordinates of areas sparsely populated, you just need to retrieve the coordinates corresponding to low densities.

dsx$x[which(dsx$y) < 0.03] # returns coordinates for which density(x) < 0.03

To combine all your coordinates (here x and y ), I would create a dataframe with coordinates and their densities and filter it based on the values of densities.

df_ds <- data.frame(dsx$x, dsy$x, dsx$y, dsy$y)
df_ds[which((df_ds$dsx.y < 0.03) & (df_ds$dsy.y < 0.01)), c("dsx.x","dsy.x")]

By default, you will get 512 values of density per coordinate. You may want to increase this step by setting n in density . Be sure to set the same value on each of your coordinate.

dsx <- density(df$x, n=2048)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM