[英]Create column with grouped values based on another column
I'm sure this has been asked before, but I don't know what to search for, so I apologise in advance. 我确定以前曾经问过,但我不知道要搜索什么,所以我提前道歉。
Let's say that I have the following data frame: 假设我有以下数据框:
grades <- data.frame(a = 1:40, b = sample(45:100, 40))
Using deplyr, I want to create a new variable that indicates the grade the student received, based on the following criteria: 90-100 = excellent, 80-90 = very good, etc. 使用deplyr,我想创建一个新变量,根据以下标准指示学生收到的成绩:90-100 =优秀,80-90 =非常好等。
I thought I could use the following to get that result with nestling ifelse() inside of mutate(): 我以为我可以使用以下内容来获取mutate()内嵌套ifelse()的结果:
grades %>%
mutate(ifelse(b >= 90, "excellent"),
ifelse(b >= 80 & b < 90, "very_good"),
ifelse(b >= 70 & b < 80, "fair"),
ifelse(b >= 60 & b < 70, "poor", "fail"))
This doesn't work, as I get the error message "argument no is missing, with no default"). 这不起作用,因为我收到错误消息“参数no缺失,没有默认值”)。 I thought the "no" would be the "fail" at the end, but obviously I'm getting the syntax wrong. 我认为“不”将是最后的“失败”,但显然我的语法错了。
I can get this to get if I first filter the original data individually, and then call ifelse, as follows: 如果我先单独过滤原始数据,然后调用ifelse,我可以得到这个,如下所示:
a <- grades %>%
filter( b >= 90) %>%
mutate(final = ifelse(b >= 90, "excellent"))
and the rbind a, b, c, etc. Obviously,this isn't how I want to do it, but I wanted to understand the syntax of ifelse(). 和rbind a,b,c等等。显然,这不是我想要的方式,但我想理解ifelse()的语法。 I'm guessing the latter works because there aren't any values that don't fill the criteria, but I still can't figure out how to get it to work when there is more than one ifelse. 我猜测后者是有效的,因为没有任何值不符合标准,但是当有多个ifelse时,我仍然无法弄清楚如何让它工作。
Define vectors with the levels and labels and then use cut
on the b
column: 使用级别和标签定义向量,然后在b
列上使用cut
:
levels <- c(-Inf, 60, 70, 80, 90, Inf)
labels <- c("Fail", "Poor", "fair", "very good", "excellent")
grades %>% mutate(x = cut(b, levels, labels = labels))
a b x
1 1 66 Poor
2 2 78 fair
3 3 97 excellent
4 4 46 Fail
5 5 89 very good
6 6 57 Fail
7 7 80 fair
8 8 98 excellent
9 9 100 excellent
10 10 93 excellent
11 11 59 Fail
12 12 51 Fail
13 13 69 Poor
14 14 75 fair
15 15 72 fair
16 16 48 Fail
17 17 74 fair
18 18 54 Fail
19 19 62 Poor
20 20 64 Poor
21 21 88 very good
22 22 70 Poor
23 23 85 very good
24 24 58 Fail
25 25 95 excellent
26 26 56 Fail
27 27 65 Poor
28 28 68 Poor
29 29 91 excellent
30 30 76 fair
31 31 82 very good
32 32 55 Fail
33 33 96 excellent
34 34 83 very good
35 35 61 Poor
36 36 60 Fail
37 37 77 fair
38 38 47 Fail
39 39 73 fair
40 40 71 fair
Or using data.table: 或者使用data.table:
library(data.table)
setDT(grades)[, x := cut(b, levels, labels)]
Or simply in base R: 或者只是在基地R:
grades$x <- cut(grades$b, levels, labels)
After taking another close look at your initial approach, I noticed that you would need to include right = FALSE
in the cut
call, because for example, 90 points should be "excellent", not just "very good". 在仔细研究了你的初始方法之后,我注意到你需要在cut
调用中包含right = FALSE
,因为例如,90分应该是“优秀”,而不仅仅是“非常好”。 So it is used to define where the interval should be closed (left or right) and the default is on the right, which is slightly different from OP's initial approach. 因此,它用于定义间隔应该关闭的位置(左侧或右侧),默认值位于右侧,这与OP的初始方法略有不同。 So in dplyr, it would then be: 所以在dplyr中,它将是:
grades %>% mutate(x = cut(b, levels, labels, right = FALSE))
and accordingly in the other options. 因此在其他选择中。
All of the ifelse
s need to be within each other. 所有的ifelse
需要在彼此之内。 Try this: 尝试这个:
mutate(ifelse(b >= 90, "excellent",
ifelse(b >= 80 & b < 90, "very_good",
ifelse(b >= 70 & b < 80, "fair",
ifelse(b >= 60 & b < 70, "poor", "fail")))))
grades$c = grades$b # creating a new column
#and filling in the grades
grades$c[grades$c >= 90] = "exellent"
grades$c[grades$c <= 90 & grades$c >= 80] = "very good"
grades$c[grades$c <= 80 & grades$c >= 70] = "fair"
grades$c[grades$c <= 70 & grades$c >= 60] = "poor"
grades$c[grades$c <= 60] = "fail"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.