I have a character-vector with the following structure:
GDM3
PER.1.1.1_1
PER.1.10.2_1
PER.1.1.32_1
PER.1.1.4_1
PER.1.1.5_1
PER.11.29.1_1
PER.1.2.2_1
PER.31.2.3_1
PER.1.2.44_1
PER.5.2.25_1
I want to extract the three numbers in the middle of middle of that ID and add leading numbers if they are only single digits. The finale vector can be a character vector again. In the end the result should look like this:
GDM3
010101
011002
010132
010104
010105
112901
010202
310203
010244
050225
tmp <- strcapture("\\.([0-9]+)\\.([0-9]+)\\.([0-9]+)_", X$GDM3,
proto = list(a=0L, b=0L, c=0L)) |>
lapply(sprintf, fmt = "%02i")
do.call(paste0, tmp)
# [1] "010101" "011002" "010132" "010104" "010105" "112901" "010202" "310203" "010244" "050225"
Explanation:
strcapture
extracts the known patterns into a data.frame
, with names and classes defined in proto
(the actual values in proto
are not used); lapply(sprintf, fmt="%02i")
zero-pads to 2 digits all columns of the frame do.call(paste, tmp)
concatenates each row of the frame into a single string. Data
X <- structure(list(GDM3 = c("PER.1.1.1_1", "PER.1.10.2_1", "PER.1.1.32_1", "PER.1.1.4_1", "PER.1.1.5_1", "PER.11.29.1_1", "PER.1.2.2_1", "PER.31.2.3_1", "PER.1.2.44_1", "PER.5.2.25_1")), class = "data.frame", row.names = c(NA, -10L))
Assuming GDM3 shown in the Note at the end, read it creating a data frame and the use sprintf to create the result.
with( read.table(text = GDM3, sep = ".", comment.char = "_"),
sprintf("%02d%02d%02d", V2, V3, V4) )
giving:
[1] "010101" "011002" "010132" "010104" "010105" "112901" "010202" "310203"
[9] "010244" "050225"
GDM3 <- c("PER.1.1.1_1", "PER.1.10.2_1", "PER.1.1.32_1", "PER.1.1.4_1",
"PER.1.1.5_1", "PER.11.29.1_1", "PER.1.2.2_1", "PER.31.2.3_1",
"PER.1.2.44_1", "PER.5.2.25_1")
Another solution:
X <- structure(list(GDM3 = c("PER.1.1.1_1", "PER.1.10.2_1", "PER.1.1.32_1", "PER.1.1.4_1", "PER.1.1.5_1", "PER.11.29.1_1", "PER.1.2.2_1", "PER.31.2.3_1", "PER.1.2.44_1", "PER.5.2.25_1")), class = "data.frame", row.names = c(NA, -10L))
strsplit(X$GDM3, "\\.|_") |>
sapply(function(x) paste0(sprintf("%02i", as.numeric(x[2:4])), collapse = ""))
#[1] "010101" "011002" "010132" "010104" "010105" "112901" "010202" "310203" "010244" "050225"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.