繁体   English   中英

R:如何按人类可读的大小排序

[英]R: How to sort sizes by human readable

我要尝试按大小对一组数据进行排序,但是项的大小从〜140K到〜130G很大,因此以字节显示并不十分容易理解。 我可以将输入数据更改为具有人类可读的大小,但是当我将其绘制出来时,它并没有按预期的顺序排列。 我将如何按照人类可读的方式对此进行分类?

码:

library(ggplot2)

mydata <- read.csv("/path/to/test.csv")
restore.df = data.frame(
    Start = as.POSIXct(mydata$start),
    Size = mydata$size,
    Labels = gsub(" [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2}","",mydata$start)
)

p <- ggplot(restore.df, aes(x=Start,y=Size)) + geom_point()
p + scale_x_date(date_labels = "%y-%m-%d", limits = as.Date('2018-06-14', "%y-%m-%d"), as.Date('2018-06-20', "%Y-%m-%d"))

png(filename="/path/to/test.png",width=1368,height=1060,units="px")
print(p)
dev.off()

结果: 结果

精简数据集:

start,stop,time,size
"2018-06-14 17:30:05","2018-06-14 17:30:05",3.6,7.3G
"2018-06-14 17:33:47","2018-06-14 17:33:47",1.05,304M
"2018-06-14 17:35:07","2018-06-14 17:35:07",62.9666666666667,132G
"2018-06-14 23:33:51","2018-06-14 23:33:51",0,880K
"2018-06-14 23:34:13","2018-06-14 23:34:13",1.16666666666667,305M
"2018-06-17 01:34:56","2018-06-17 01:34:56",20.2666666666667,6.2G
"2018-06-17 01:56:13","2018-06-17 01:56:13",15.7833333333333,9.4G
"2018-06-22 17:34:33","2018-06-22 17:34:33",0,144K

我不确定是否已经有可以将其转换的软件包,但是您可以转换大小并手动进行排列。 然后根据需要绘制并调整y轴标签。

library(dplyr)
library(ggplot2)
d <- structure(list(start = c("2018-06-14 17:30:05", "2018-06-14 17:33:47", 
                              "2018-06-14 17:35:07", "2018-06-14 23:33:51", "2018-06-14 23:34:13", 
                              "2018-06-17 01:34:56", "2018-06-17 01:56:13", "2018-06-22 17:34:33"), 
                    stop = c("2018-06-14 17:30:05", "2018-06-14 17:33:47", "2018-06-14 17:35:07", 
                             "2018-06-14 23:33:51", "2018-06-14 23:34:13", "2018-06-17 01:34:56", 
                             "2018-06-17 01:56:13", "2018-06-22 17:34:33"), 
                    time = c(3.6, 1.05, 62.9666666666667, 0, 1.16666666666667, 20.2666666666667, 
                             15.7833333333333, 0), 
                    size = c("7.3G", "304M", "132G", "880K", "305M", "6.2G", "9.4G", "144K")), 
                    .Names = c("start", "stop", "time", "size"), class = "data.frame", row.names = c(NA, -8L))

## function to convert sizes
convert_size <- function(x){
  ## if all numbers
  if(grepl('^[0-9]{1,}$', x)) return(x)
  ## convert when not
  prefix <- substr(x, nchar(x), nchar(x))
  n <- substr(x, 1, nchar(x)-1)
  fct <- dplyr::case_when(
    prefix == 'K' ~ 1024,
    prefix == 'M' ~ 1024^2,
    prefix == 'G' ~ 1024^3,
    prefix == 'T' ~ 1024^4,
  )
  xx <- as.numeric(n)*fct
  return(xx)
}

d2 <- d %>% mutate(fsize = sapply(size, convert_size)) %>% arrange(fsize)

restore.df = data.frame(
  Start = as.POSIXct(d2$start),
  Size = d2$size,
  FSize = d2$fsize,
  Labels = gsub(" [0-9]{1,2}:[0-9]{1,2}:[0-9]{1,2}","",d2$start)
)
print(restore.df)
#>                 Start Size        FSize     Labels
#> 1 2018-06-22 17:34:33 144K       147456 2018-06-22
#> 2 2018-06-14 23:33:51 880K       901120 2018-06-14
#> 3 2018-06-14 17:33:47 304M    318767104 2018-06-14
#> 4 2018-06-14 23:34:13 305M    319815680 2018-06-14
#> 5 2018-06-17 01:34:56 6.2G   6442450944 2018-06-17
#> 6 2018-06-14 17:30:05 7.3G   7516192768 2018-06-14
#> 7 2018-06-17 01:56:13 9.4G   9663676416 2018-06-17
#> 8 2018-06-14 17:35:07 132G 141733920768 2018-06-14

## plot
# adjust for breaks
bks <- c('100K','1M','100M','1G','10G','100G')
p <- ggplot(restore.df, aes(x=as.Date(Start),y=FSize)) + geom_point()
p + scale_x_date(date_labels = "%Y-%m-%d", limits = c(as.Date('2018-06-14', "%Y-%m-%d"), 
             as.Date('2018-06-20', "%Y-%m-%d"))) + 
  scale_y_log10(breaks = sapply(bks, convert_size), labels = bks)

#Created on 2018-07-24 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0.9000).

这是一种对Size字符串进行排序的简单方法,假定它们始终为4个字符长(如提供的示例中所示)。

Size <- c('7.3G', '304M', '132G', '880K', '305M', '6.2G', '9.4G', '144K')

s1 <- substr(a, 1, 3)
s2 <- substr(a, 4, 4)
ii <- order(chartr('KMG', '123', s2), as.numeric(s1), Size)
print(Size[ii])

# [1] "144K" "880K" "304M" "305M" "6.2G" "7.3G" "9.4G" "132G"

order()函数根据第一个参数然后第二个参数的排序顺序提供Size的排序顺序。 第一个是K,M或G,分别替换为1、2或3,因此它们按数字排序。 第二个是Size字符串的前三个字符的数值。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM