简体   繁体   中英

How can I sort words of variable in R data.table?

I have a messy data that consists of a string with 1-3 codes.

library(data.table)
data <- data.table(ID = c(1, 2), text = c("3TC ABC DTG", "3TC DTG ABC"))

Unfortunately the codes are not written in alphabetical order and I would like them to appear so. Both records should translate to 3TC ABC DTG

I tried mocking around with splitting string

data[, c("text1", "text2", "text3") := tstrsplit(text, " ", fixed = TRUE)]

but cannot find a way to sort and combine these three:/

I also thought about reshaping but then my dcast seems to have troubles:

data_long <- melt(data, 
                  id.vars = c("ID"),
                  measure.vars =  c("text1", "text2", "text3"), 
                  na.rm = TRUE)

result <- dcast(data,
                ID ~ variable,
                function (x) paste(x, collapse = " "))

Any way around it?

you were very close.. try

data[, text_new := unlist( lapply( strsplit( text, " " ), 
                                   function(x) paste0( sort(x), collapse = " "))) ]

   ID        text    text_new
1:  1 3TC ABC DTG 3TC ABC DTG
2:  2 3TC DTG ABC 3TC ABC DTG

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM