简体   繁体   English

寻找更简洁的方法来对变量进行重新分类

[英]Looking for a more concise way to recategorise a variable

I have a vector of integer ages that I want to turn into multiple categories: 我有一个整数年龄的向量,我想分为多个类别:

ages <- round(runif(10, 0, 99))

Now I want this variable to be binned into three categories, depending on age. 现在,我希望根据年龄将变量分为三类。 I want an output object, ages.cat to look like this: 我想要一个输出对象ages.cat看起来像这样:

   young mid old
1      0   0   1
2      1   0   0
3      1   0   0
4      1   0   0
5      1   0   0
6      0   1   0
7      1   0   0
8      0   0   1
9      0   1   0
10     0   1   0

At present I am creating this object with the following code: 目前,我正在使用以下代码创建此对象:

ages.cat <- array(0, dim=c(10,3)) # create categorical object for 3 bins
ages.cat[ages < 30, 1] <- 1
ages.cat[ages >= 30 & ages < 60, 2] <- 1
ages.cat[ages >= 60, 3] <- 1

ages.cat <- data.frame(ages.cat)
names(ages.cat) <- c("young", "mid", "old")

There must be a faster and more concise way to recode this data - had a play with dplyr but couldn't see a solution to this particular problem with its functions. 必须有一种更快,更简洁的方式来重新编码该数据-曾经尝试dplyr,但无法通过其功能看到解决此特定问题的方法。 Any ideas? 有任何想法吗? What's would be the 'canonical' solution to this problem in base R or using a package? 在基础R或使用包中解决此问题的“规范”解决方案是什么? Whatever the alternatives, I'm certain they'll be more concise than my clunky code! 无论选择哪种方式,我都可以肯定它们会比我笨拙的代码更简洁!

Its two one-liners. 它有两个单线。

Use cut to create a factor: 使用cut创建一个因子:

ages <- round(runif(10, 0, 99))
ageF=cut(ages,c(-Inf,30,60,Inf),labels=c("young","mid","old"))
> ageF
 [1] young mid   young young old   mid   old   young old   old  
Levels: young mid old

Usually you'd leave that as a factor and work with it, if you are using R's modelling functions they'll work out the matrix for you. 通常,您会将其作为一个因素并加以使用,如果您使用R的建模函数,它们将为您计算矩阵。 But if you are doing it yourself: 但是,如果您自己做:

Use model.matrix to create the matrix, with a -1 to remove the intercept and create columns for each level: 使用model.matrix创建矩阵,并使用-1删除截距并为每个级别创建列:

> m = model.matrix(~ageF-1)
> m
   ageFyoung ageFmid ageFold
1          1       0       0
2          0       1       0
3          1       0       0
4          1       0       0
5          0       0       1
6          0       1       0
7          0       0       1
8          1       0       0
9          0       0       1
10         0       0       1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$ageF
[1] "contr.treatment"

You can ignore all the contrasty stuff at the end, its just a matrix with some extra attributes for modelling. 最后,您可以忽略所有对比性内容,它只是一个具有一些额外属性用于建模的矩阵。

Try this: 尝试这个:

library(dplyr)

ages <- 
  data.frame(ages = round(runif(10, 0, 99))) %.%
  mutate(id = 1:n(), 
         cat = factor(ifelse(ages < 30, "young",
                             ifelse(ages >= 30 & ages < 60, 
                                    "mid", "old")))) %.%
  dcast(id ~ cat, value.var = 'ages', length)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 regex 反转字符串的任何简洁方法,例如 R 中的 gsub ? (寻找解决方法) - Any concise way of using `regex` to reverse a string, e.g., with `gsub` in R? (looking for workarounds) 寻找一种更有效的方法来过滤数组 - Looking for a more efficient way to filter an array 在R中使函数更简洁 - Making a function more concise in R 嵌套 for 循环到更简洁的结构 - Nested for loops to a more concise structure 数据框的某些列(按行)中是否存在值? 使用%in%和| 在多列上工作,但是还有更简洁的方法吗? - Is a value present in certain columns of a dataframe (by row)? Using %in% and | on multiple columns works, but is there a more concise way? 是否有更有效或更简洁的方法来使用 tidyr::gather 使我的数据看起来“整洁”? - Is there more efficient or concise way to use tidyr::gather to make my data look 'tidy'? 在 data.table 中取消嵌套嵌套列的任何更简洁的“data.table”方法? - Any more concise `data.table` way to unnest a nested column in data.table? 使用c()定义R向量时,有没有一种简洁的方法来指定带有变量的元素名称 - Is there a concise way to specify an element name with a variable when defining an R vector with c() 寻找一种比在R中循环更有效的方法来创建数据帧 - looking for a more efficient way to create a dataframe than looping in R 寻找一种更优雅的方法来用列表中的特定值填充矩阵 - Looking for a more elegant way to populate a matrix with specific values from a list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM