简体   繁体   中英

R boxplot() and summary() of frequency table

I have a data matrix that contains two columns: value, freq.
I want to make a boxplot of value, weighted by frequency column. The matrix is sorted by value.

> data[1:5,]
  value freq
1    28 1184
2    29 1063
3    30 1000
4    31  976
5    32  944

I have read many answers about similar problems, the only solution I found is this. http://r.789695.n4.nabble.com/boxplot-with-frequencies-counts-td4660330.html

counts<-matrix(rep(data$value,data$freq), ncol=1, byrow=TRUE)
boxplot(counts)

The problem with building a matrix repeating every value is that it makes an incredibly large matrix. While R was able to do this in its memory, I am working on R on a virtual machine (Ubuntu) and I wonder for really big data sets if there might be an alternate approach. Perhaps there is a library built for this purpose?

You need data.table library . Here is an example of improved performance

using diamonds dataset in ggplot2 library

> count <- as.data.table(rep(diamonds$carat,diamonds$depth),ncol=1,byrow=TRUE)
> count1 <- system.time(matrix(rep(diamonds$carat,diamonds$depth),ncol=1,byrow=TRUE))
> count1
   user  system elapsed 
   0.15    0.02    0.18 
> count <- system.time(as.data.table(rep(diamonds$carat,diamonds$depth),ncol=1,byrow=TRUE))
> count
   user  system elapsed 
   0.04    0.03    0.06 

By scaling your freq columns, you can overcome memory limitations and still get the same boxplot in your case. See the code below. However, if you observe/suspect outliers and want these in your boxplot, you will have to handle/plot them separately.

    > data<-data.frame(value=c(28,29,30,31,32),freq=c(1184,1063,1000,976,944))
    > counts<-matrix(rep(data$value,data$freq), ncol=1, byrow=TRUE)
    > length(counts)
    [1] 5167
    > boxplot(counts,at=1,xlim=c(0,3))
    > counts<-matrix(rep(data$value,round(data$freq/100)), ncol=1, byrow=TRUE)
    > length(counts)
    [1] 52
    > boxplot(counts,at=2,add=T)

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM