[英]Recode values to a new variable using R
I have a dataset with a variable that i need to change anonymise by recoding into a different variable. 我有一个带有变量的数据集,我需要通过重新编码为其他变量来更改匿名性。 There are 20,000 entries, some are duplicated so my data looks something like this:
有20,000个条目,有些是重复的,所以我的数据看起来像这样:
DCD97568
DCD23547
DCD27656
DCD27656
DCD87590
The end product I want is a new variable that looks like this: 我想要的最终产品是一个新变量,如下所示:
DCD00001
DCD00002
DCD00003
DCD00003
DCD00004
Thanks! 谢谢!
Update: 更新:
I need to deal with some NA entries in the original variable and I want these to be NA in the new variable so this 我需要处理原始变量中的一些NA条目,并且我希望这些变量成为新变量中的NA。
DCD14579
DCD21548
NA
DCD79131
DCD79131
DCD12313
would become 会成为
DCD00001
DCD00002
NA
DCD00003
DCD00003
DCD00004
WE can do this with sprintf
and match
我们可以使用
sprintf
进行match
df1$Col1 <- sprintf("DCD%05d", match(df1$Col1, unique(df1$Col1)))
df1$Col1
#[1] "DCD00001" "DCD00002" "DCD00003" "DCD00003" "DCD00004"
Or another option is factor
或另一个选择是
factor
with(df1, sprintf("DCD%05d", as.integer(factor(Col1, levels = unique(Col1)))))
df1 <- structure(list(Col1 = c("DCD97568", "DCD23547", "DCD27656", "DCD27656",
"DCD87590")), .Names = "Col1", class = "data.frame",
row.names = c(NA, -5L))
Using data.table
rleid
, Thanks for some of the comments , Assumption here is that the data is in sequence or it can be used once the data is sorted : 使用
data.table
rleid
,感谢一些评论, 这里的假设是数据是按顺序排列的,或者一旦对数据进行排序就可以使用它 :
x <- c("DCD97568",
"DCD23547",
"DCD27656",
"DCD27656",
"DCD87590")
new <- paste0("DCD000",data.table::rleid(x))
> new
[1] "DCD0001" "DCD0002" "DCD0003" "DCD0003"
[5] "DCD0004"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.