简体   繁体   English

使用R将值重新编码为新变量

[英]Recode values to a new variable using R

I have a dataset with a variable that i need to change anonymise by recoding into a different variable. 我有一个带有变量的数据集,我需要通过重新编码为其他变量来更改匿名性。 There are 20,000 entries, some are duplicated so my data looks something like this: 有20,000个条目,有些是重复的,所以我的数据看起来像这样:

DCD97568
DCD23547
DCD27656
DCD27656
DCD87590

The end product I want is a new variable that looks like this: 我想要的最终产品是一个新变量,如下所示:

DCD00001
DCD00002
DCD00003
DCD00003
DCD00004

Thanks! 谢谢!

Update: 更新:

I need to deal with some NA entries in the original variable and I want these to be NA in the new variable so this 我需要处理原始变量中的一些NA条目,并且我希望这些变量成为新变量中的NA。

DCD14579
DCD21548
NA
DCD79131
DCD79131
DCD12313

would become 会成为

DCD00001
DCD00002
NA
DCD00003
DCD00003
DCD00004

WE can do this with sprintf and match 我们可以使用sprintf进行match

df1$Col1 <- sprintf("DCD%05d", match(df1$Col1, unique(df1$Col1)))
df1$Col1
#[1] "DCD00001" "DCD00002" "DCD00003" "DCD00003" "DCD00004"

Or another option is factor 或另一个选择是factor

with(df1, sprintf("DCD%05d", as.integer(factor(Col1, levels = unique(Col1)))))

data 数据

df1 <- structure(list(Col1 = c("DCD97568", "DCD23547", "DCD27656", "DCD27656", 
"DCD87590")), .Names = "Col1", class = "data.frame",
 row.names = c(NA, -5L))

Using data.table rleid , Thanks for some of the comments , Assumption here is that the data is in sequence or it can be used once the data is sorted : 使用data.table rleid ,感谢一些评论, 这里的假设是数据是按顺序排列的,或者一旦对数据进行排序就可以使用它

x <- c("DCD97568",
       "DCD23547",
       "DCD27656",
       "DCD27656",
       "DCD87590")

new <- paste0("DCD000",data.table::rleid(x))

> new
[1] "DCD0001" "DCD0002" "DCD0003" "DCD0003"
[5] "DCD0004"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM