[英]R : How to extract the factor levels as numeric from a column and assign it to a new column using tydyverse?
Suppose I have a data frame, df假设我有一个数据框 df
df = data.frame(name = rep(c("A", "B", "C"), each = 4))
I want to get a new data frame with one additional column named Group
, in which Group
element is the numeric value of the corresponding level of name
, as shown in df2 .我想得到一个新的数据框,其中包含一个名为
Group
的附加列,其中Group
元素是对应级别name
的数值,如df2所示。
I know case_when
could do it.我知道
case_when
可以做到。 My issue is that my real data frame is quite complicated, there are many levels of the name
column.我的问题是我的真实数据框非常复杂,
name
列有很多级别。 I am too lazy to list case by case.我懒得逐个列出。
Is there an easier and smarter way to do it?有没有更简单、更聪明的方法呢?
Thanks.谢谢。
df2
name Group
1 A 1
2 A 1
3 A 1
4 A 1
5 B 2
6 B 2
7 B 2
8 B 2
9 C 3
10 C 3
11 C 3
12 C 3
There are a few ways to do it in tidyverse
在
tidyverse
中有几种方法可以做到这一点
library(tidyverse)
df %>% group_by(name) %>% mutate(Group = cur_group_id())
or或者
df %>% mutate(Group = as.numeric(as.factor(name)))
name Group
1 A 1
2 A 1
3 A 1
4 A 1
5 B 2
6 B 2
7 B 2
8 B 2
9 C 3
10 C 3
11 C 3
12 C 3
A couple other simple solutions:其他几个简单的解决方案:
library(dplyr)
df %>%
mutate(Group = match(name, unique(name)))
#> name Group
#> 1 A 1
#> 2 A 1
#> 3 A 1
#> 4 A 1
#> 5 B 2
#> 6 B 2
#> 7 B 2
#> 8 B 2
#> 9 C 3
#> 10 C 3
#> 11 C 3
#> 12 C 3
df %>%
mutate(Group = cumsum(name != lag(name, default = "")))
#> name Group
#> 1 A 1
#> 2 A 1
#> 3 A 1
#> 4 A 1
#> 5 B 2
#> 6 B 2
#> 7 B 2
#> 8 B 2
#> 9 C 3
#> 10 C 3
#> 11 C 3
#> 12 C 3
data.table data.table
df = data.frame(name = rep(c("A", "B", "C"), each = 4))
library(data.table)
setDT(df)[, grp := .GRP, by = name][]
#> name grp
#> 1: A 1
#> 2: A 1
#> 3: A 1
#> 4: A 1
#> 5: B 2
#> 6: B 2
#> 7: B 2
#> 8: B 2
#> 9: C 3
#> 10: C 3
#> 11: C 3
#> 12: C 3
Created on 2022-02-10 by the reprex package (v2.0.1)由reprex package (v2.0.1) 创建于 2022-02-10
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.