简体   繁体   English

R:如何使用 tydyverse 从列中将因子水平提取为数字并将其分配给新列?

[英]R : How to extract the factor levels as numeric from a column and assign it to a new column using tydyverse?

Suppose I have a data frame, df假设我有一个数据框 df

df = data.frame(name = rep(c("A", "B", "C"), each = 4))

I want to get a new data frame with one additional column named Group , in which Group element is the numeric value of the corresponding level of name , as shown in df2 .我想得到一个新的数据框,其中包含一个名为Group的附加列,其中Group元素是对应级别name的数值,如df2所示。

I know case_when could do it.我知道case_when可以做到。 My issue is that my real data frame is quite complicated, there are many levels of the name column.我的问题是我的真实数据框非常复杂, name列有很多级别。 I am too lazy to list case by case.我懒得逐个列出。

Is there an easier and smarter way to do it?有没有更简单、更聪明的方法呢?

Thanks.谢谢。

df2
   name Group
1     A     1
2     A     1
3     A     1
4     A     1
5     B     2
6     B     2
7     B     2
8     B     2
9     C     3
10    C     3
11    C     3
12    C     3

There are a few ways to do it in tidyversetidyverse中有几种方法可以做到这一点

library(tidyverse)

df %>% group_by(name) %>% mutate(Group = cur_group_id())

or或者

df %>% mutate(Group = as.numeric(as.factor(name)))

Output Output

  name Group
1     A  1
2     A  1
3     A  1
4     A  1
5     B  2
6     B  2
7     B  2
8     B  2
9     C  3
10    C  3
11    C  3
12    C  3

A couple other simple solutions:其他几个简单的解决方案:

library(dplyr)

df %>%
  mutate(Group = match(name, unique(name)))
#>    name Group
#> 1     A     1
#> 2     A     1
#> 3     A     1
#> 4     A     1
#> 5     B     2
#> 6     B     2
#> 7     B     2
#> 8     B     2
#> 9     C     3
#> 10    C     3
#> 11    C     3
#> 12    C     3

df %>%
  mutate(Group = cumsum(name != lag(name, default = "")))
#>    name Group
#> 1     A     1
#> 2     A     1
#> 3     A     1
#> 4     A     1
#> 5     B     2
#> 6     B     2
#> 7     B     2
#> 8     B     2
#> 9     C     3
#> 10    C     3
#> 11    C     3
#> 12    C     3

data.table data.table

df = data.frame(name = rep(c("A", "B", "C"), each = 4))

library(data.table)
setDT(df)[, grp := .GRP, by = name][]
#>     name grp
#>  1:    A   1
#>  2:    A   1
#>  3:    A   1
#>  4:    A   1
#>  5:    B   2
#>  6:    B   2
#>  7:    B   2
#>  8:    B   2
#>  9:    C   3
#> 10:    C   3
#> 11:    C   3
#> 12:    C   3

Created on 2022-02-10 by the reprex package (v2.0.1)reprex package (v2.0.1) 创建于 2022-02-10

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将数字列更改为因子并为数据分配标签/级别 - change a numeric column to a factor and assign labels/levels to the data 如何在另一列的因子水平长度中创建一个包含两个因子水平的新列? - How to create a new column containing two factor levels in the length of factor levels from another column? 从具有因子列和数字列的 dataframe 到具有因子水平作为列和相应数值作为行的 dataframe - Going from a dataframe with a factor column and numeric column to a dataframe with factor levels as columns and corresponding numeric values as rows 从 R 中的数字列向量创建因子 - Creating a factor from a numeric column vector in R 根据其他因素的水平将一列的值分配给另一列:R - Assign values from one column to another based on levels of other factor: R 如何为R中的列中的新单词分配数值 - How to assign numeric value to new word within column in R 如何将数字列转换为R中的因子 - how to convert numeric column to factor in R 按名称将所有因子级别返回为三列data.table [R]中的新列 - Return all factor levels by name as new columns from a three column data.table [R] 从R中的因子列中提取时间 - Extract time from factor column in R 从列号R中提取因子的水平 - Extracting levels of a factor from column number-R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM