简体   繁体   中英

Recode values to a new variable using R

I have a dataset with a variable that i need to change anonymise by recoding into a different variable. There are 20,000 entries, some are duplicated so my data looks something like this:

DCD97568
DCD23547
DCD27656
DCD27656
DCD87590

The end product I want is a new variable that looks like this:

DCD00001
DCD00002
DCD00003
DCD00003
DCD00004

Thanks!

Update:

I need to deal with some NA entries in the original variable and I want these to be NA in the new variable so this

DCD14579
DCD21548
NA
DCD79131
DCD79131
DCD12313

would become

DCD00001
DCD00002
NA
DCD00003
DCD00003
DCD00004

WE can do this with sprintf and match

df1$Col1 <- sprintf("DCD%05d", match(df1$Col1, unique(df1$Col1)))
df1$Col1
#[1] "DCD00001" "DCD00002" "DCD00003" "DCD00003" "DCD00004"

Or another option is factor

with(df1, sprintf("DCD%05d", as.integer(factor(Col1, levels = unique(Col1)))))

data

df1 <- structure(list(Col1 = c("DCD97568", "DCD23547", "DCD27656", "DCD27656", 
"DCD87590")), .Names = "Col1", class = "data.frame",
 row.names = c(NA, -5L))

Using data.table rleid , Thanks for some of the comments , Assumption here is that the data is in sequence or it can be used once the data is sorted :

x <- c("DCD97568",
       "DCD23547",
       "DCD27656",
       "DCD27656",
       "DCD87590")

new <- paste0("DCD000",data.table::rleid(x))

> new
[1] "DCD0001" "DCD0002" "DCD0003" "DCD0003"
[5] "DCD0004"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM