[英]Looking for a more concise way to recategorise a variable
I have a vector of integer ages that I want to turn into multiple categories: 我有一个整数年龄的向量,我想分为多个类别:
ages <- round(runif(10, 0, 99))
Now I want this variable to be binned into three categories, depending on age. 现在,我希望根据年龄将变量分为三类。 I want an output object,
ages.cat
to look like this: 我想要一个输出对象
ages.cat
看起来像这样:
young mid old
1 0 0 1
2 1 0 0
3 1 0 0
4 1 0 0
5 1 0 0
6 0 1 0
7 1 0 0
8 0 0 1
9 0 1 0
10 0 1 0
At present I am creating this object with the following code: 目前,我正在使用以下代码创建此对象:
ages.cat <- array(0, dim=c(10,3)) # create categorical object for 3 bins
ages.cat[ages < 30, 1] <- 1
ages.cat[ages >= 30 & ages < 60, 2] <- 1
ages.cat[ages >= 60, 3] <- 1
ages.cat <- data.frame(ages.cat)
names(ages.cat) <- c("young", "mid", "old")
There must be a faster and more concise way to recode this data - had a play with dplyr but couldn't see a solution to this particular problem with its functions. 必须有一种更快,更简洁的方式来重新编码该数据-曾经尝试过dplyr,但无法通过其功能看到解决此特定问题的方法。 Any ideas?
有任何想法吗? What's would be the 'canonical' solution to this problem in base R or using a package?
在基础R或使用包中解决此问题的“规范”解决方案是什么? Whatever the alternatives, I'm certain they'll be more concise than my clunky code!
无论选择哪种方式,我都可以肯定它们会比我笨拙的代码更简洁!
Its two one-liners. 它有两个单线。
Use cut
to create a factor: 使用
cut
创建一个因子:
ages <- round(runif(10, 0, 99))
ageF=cut(ages,c(-Inf,30,60,Inf),labels=c("young","mid","old"))
> ageF
[1] young mid young young old mid old young old old
Levels: young mid old
Usually you'd leave that as a factor and work with it, if you are using R's modelling functions they'll work out the matrix for you. 通常,您会将其作为一个因素并加以使用,如果您使用R的建模函数,它们将为您计算矩阵。 But if you are doing it yourself:
但是,如果您自己做:
Use model.matrix
to create the matrix, with a -1 to remove the intercept and create columns for each level: 使用
model.matrix
创建矩阵,并使用-1删除截距并为每个级别创建列:
> m = model.matrix(~ageF-1)
> m
ageFyoung ageFmid ageFold
1 1 0 0
2 0 1 0
3 1 0 0
4 1 0 0
5 0 0 1
6 0 1 0
7 0 0 1
8 1 0 0
9 0 0 1
10 0 0 1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$ageF
[1] "contr.treatment"
You can ignore all the contrasty stuff at the end, its just a matrix with some extra attributes for modelling. 最后,您可以忽略所有对比性内容,它只是一个具有一些额外属性用于建模的矩阵。
Try this: 尝试这个:
library(dplyr)
ages <-
data.frame(ages = round(runif(10, 0, 99))) %.%
mutate(id = 1:n(),
cat = factor(ifelse(ages < 30, "young",
ifelse(ages >= 30 & ages < 60,
"mid", "old")))) %.%
dcast(id ~ cat, value.var = 'ages', length)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.