简体   繁体   中英

R - combine two character vectors, then cut off last character

I am handling a large data set which stores a 9-digit ID in two columns, say ID_part_1 and ID_part_2

ID part 1 is a common identifier for top-level specification which are duplicated throughout this column and ID part 2 is unique for each ID part 1. I want to combine part 1 with part 2 and then cut off the last character or integer of the generated strings.

See the example data below:

    ID_part_1    ID_part_2    Comb_ID
    G12345       678          G1234567
    G12345       679          G1234567
    A23567       9C1          A235679C
    123456       789          12345678

All data is stored in a data.table, say my_data.dt , so the columns can be addressed easily. Both columns ID_part_1 and ID_part_2 are of type "character". The computed results should be stored in column Comb_ID. As trimming the last character off the combined string, I will then subsequently extract all unique values from the computed column as:

unique(my_data.dt[, Comb_ID])

We can use substr with paste in base R

my_data.dt$Comb_ID <- with(my_data.dt,
      paste0(ID_part_1, substr(ID_part_2, 1, 2)))

my_data.dt$Comb_ID
#[1] "G1234567" "G1234567" "A235679C" "12345678"

NOTE: No packages are needed

data

my_data.dt <- structure(list(ID_part_1 = c("G12345", "G12345", "A23567", "123456"
), ID_part_2 = c("678", "679", "9C1", "789"), Comb_ID = c("G1234567", 
"G1234567", "A235679C", "12345678")), class = "data.frame", row.names = c(NA, 
-4L))

An option based in the tidyverse.

library(dplyr)
library(stringr)
library(purrr)

data %>%
  mutate(Comb_ID = map2_chr(ID_part_1, ID_part_2, ~ str_c(.x, .y, collapse = '')),
         Comb_ID = str_sub(Comb_ID, 1, -2))


#    ID_part_1 ID_part_2  Comb_ID
# 1:    G12345       678 G1234567
# 2:    G12345       679 G1234567
# 3:    A23567       9C1 A235679C
# 4:    123456       789 12345678

Data

data <- structure(list(ID_part_1 = c("G12345", "G12345", "A23567", "123456"
), ID_part_2 = c("678", "679", "9C1", "789")), row.names = c(NA, 
-4L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x55dd6c5238e0>)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM