简体   繁体   English


[英]How to categorize a vector in R to draw a pie chart

I want to categorize rivers dataset into “tiny” (<500), “short” (<1500), “medium” (<3000) and “long” (>=3000). 我想将河流数据集分为“小”(<500),“短”(<1500),“中”(<3000)和“长”(> = 3000)。 I want to plot a pie chart that visualizes frequency of these four categories. 我想绘制一个饼图,以可视化这四个类别的频率。

I tried: 我试过了:

 rivers[rivers >= 3000] = 'long'
 rivers[rivers >= 1500 & rivers < 3000] = 'meidum'
 rivers[rivers >= 500 & rivers < 1500]='short'
 rivers[rivers < 500] = 'tiny'

It seems the third command has no effect on data and they are the same as before! 似乎第三条命令对数据没有影响,它们与以前相同!

   500    505    524    525    529    538    540    545    560    570    600    605 
     2      1      1      2      1      1      1      1      1      1      3      1 
   610    618    620    625    630    652    671    680    696    710    720    730 
     1      1      1      1      1      1      1      1      1      1      2      1 
   735    760    780    800    840    850    870    890    900    906    981   long 
     2      1      1      1      1      1      1      1      2      1      1      1 
meidum   tiny 
    36     62 

What is wrong with my commands, and is it the right way to draw a pie chart for them? 我的命令有什么问题,这是为他们绘制饼图的正确方法吗?

The cut function and easily perform this task: cut功能并轻松执行此任务:

#random data
rivers<-runif(20, 0, 5000)

#break into desired groups and label
answer<-cut(rivers, breaks=c(0, 500, 1500, 3000, Inf), 
    labels=c("tiny", "short", "medium", "long"), right=FALSE) 

# tiny  short medium   long 
#    1     10      7      2 

You are running into this problem because you are trying to assign character values to an integer vector. 您正在遇到此问题,因为您试图将字符值分配给整数向量。 If you work with a character vector instead, it should work: 如果改为使用字符向量,则它应该起作用:

> rivers_size <- as.character(rivers)
> rivers_size[rivers >= 3000] = 'long'
> rivers_size[rivers >= 1500 & rivers < 3000] = 'meidum'
> rivers_size[rivers >= 500 & rivers < 1500]='short'
> rivers_size[rivers < 500] = 'tiny'
> table(rivers_size)
  long meidum  short   tiny 
     1      5     53     82 
> pie(table(rivers_size))


Alternatively, the same thing can be accomplished using cut (as @Dave2e shows): 另外,可以使用cut来完成同一件事(如@ Dave2e所示):

rivers <- cut(datasets::rivers,
              breaks = c(0, 500, 1500, 3000, Inf), 
              labels = c("tiny", "short", "medium", "long"),
              right = FALSE)

Here is another alternative using dplyr::case_when . 这是使用dplyr::case_when另一种选择。 It is more verbose than using cut but it is also easier generalize. 它比使用cut更冗长,但也更容易推广。


set.seed(1234) # for reproducibility

# `case_when` vectorizes multiple `if-else` statements.
rivers <- sample.int(5000, size = 1000, replace = TRUE)
rivers <- case_when(
  rivers >= 3000 ~ "long",
  rivers >= 1500 ~ "medium",
  rivers >= 500  ~ "short",
  TRUE ~ "tiny"
#> rivers
#>   long medium  short   tiny 
#>    406    303    199     92

Created on 2019-04-10 by the reprex package (v0.2.1) reprex软件包 (v0.2.1)创建于2019-04-10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM