简体   繁体   English

从字符向量中提取数字并添加前导零

[英]Extract numbers from a character vector and adding leading zeros

I have a character-vector with the following structure:我有一个具有以下结构的字符向量:

GDM3
PER.1.1.1_1
PER.1.10.2_1
PER.1.1.32_1
PER.1.1.4_1
PER.1.1.5_1
PER.11.29.1_1
PER.1.2.2_1
PER.31.2.3_1
PER.1.2.44_1
PER.5.2.25_1

I want to extract the three numbers in the middle of middle of that ID and add leading numbers if they are only single digits.我想提取该 ID 中间的三个数字,如果它们只是个位数,则添加前导数字。 The finale vector can be a character vector again.结局向量可以再次是字符向量。 In the end the result should look like this:最后的结果应该是这样的:

GDM3
010101
011002
010132
010104
010105
112901
010202
310203
010244
050225
tmp <- strcapture("\\.([0-9]+)\\.([0-9]+)\\.([0-9]+)_", X$GDM3, 
                  proto = list(a=0L, b=0L, c=0L)) |>
  lapply(sprintf, fmt = "%02i")
do.call(paste0, tmp)
#  [1] "010101" "011002" "010132" "010104" "010105" "112901" "010202" "310203" "010244" "050225"

Explanation:解释:

  • strcapture extracts the known patterns into a data.frame , with names and classes defined in proto (the actual values in proto are not used); strcapture将已知模式提取到data.frame中,名称和类在proto中定义(未使用proto中的实际值);
  • lapply(sprintf, fmt="%02i") zero-pads to 2 digits all columns of the frame lapply(sprintf, fmt="%02i")零填充帧的所有列的 2 位数字
  • do.call(paste, tmp) concatenates each row of the frame into a single string. do.call(paste, tmp)将帧的每一行连接成一个字符串。

Data数据

X <- structure(list(GDM3 = c("PER.1.1.1_1", "PER.1.10.2_1", "PER.1.1.32_1", "PER.1.1.4_1", "PER.1.1.5_1", "PER.11.29.1_1", "PER.1.2.2_1", "PER.31.2.3_1", "PER.1.2.44_1", "PER.5.2.25_1")), class = "data.frame", row.names = c(NA, -10L))

Assuming GDM3 shown in the Note at the end, read it creating a data frame and the use sprintf to create the result.假设最后的注释中显示了 GDM3,请阅读它创建一个数据框并使用 sprintf 来创建结果。

with( read.table(text = GDM3, sep = ".", comment.char = "_"), 
  sprintf("%02d%02d%02d", V2, V3, V4) )

giving:给予:

 [1] "010101" "011002" "010132" "010104" "010105" "112901" "010202" "310203"
 [9] "010244" "050225"

Note笔记

GDM3 <- c("PER.1.1.1_1", "PER.1.10.2_1", "PER.1.1.32_1", "PER.1.1.4_1", 
  "PER.1.1.5_1", "PER.11.29.1_1", "PER.1.2.2_1", "PER.31.2.3_1", 
  "PER.1.2.44_1", "PER.5.2.25_1")

Another solution:另一个解决方案:

X <- structure(list(GDM3 = c("PER.1.1.1_1", "PER.1.10.2_1", "PER.1.1.32_1", "PER.1.1.4_1", "PER.1.1.5_1", "PER.11.29.1_1", "PER.1.2.2_1", "PER.31.2.3_1", "PER.1.2.44_1", "PER.5.2.25_1")), class = "data.frame", row.names = c(NA, -10L))
strsplit(X$GDM3, "\\.|_") |>
  sapply(function(x) paste0(sprintf("%02i", as.numeric(x[2:4])), collapse = ""))
#[1] "010101" "011002" "010132" "010104" "010105" "112901" "010202" "310203" "010244" "050225"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM