简体   繁体   中英

R - Extract numbers before characters, create a list

Good morning,

I have a dataframe where one of the columns has observations that look like that:

row1: 28316496(15)|28943784(8)|28579919(7)

row2: 29343898(1)

I would like to create a new column that would extract the numbers that are not in parenthesis, create a list, and then append all these numbers to create a list with all these numbers.

Said differently at the end, I would like to end up with the following list:

28316496;28943784;28579919;29343898

It could also be any other similar object, I am just interested in getting all these numbers and matching them with another dataset.

I have tried using str_extract_all to extract the numbers but I am having trouble understanding the pattern argument. For instance I have tried:

str_extract_all("28316496(15)|28943784(8)", "\d+(\d)")

and

gsub("\s*\(.*", "", "28316496(15)|28943784(8)")

but it is not returning exactly what I want.

Any idea for extracting the number outside the brackets and create a giant list out of that?

Thanks a lot!

Here is a way.

x <- c("28316496(15)|28943784(8)|28579919(7)", "29343898(1)")

y <- strsplit(x, "\\|")
y <- lapply(y, \(.y) sub("\\([^\\(\\)]+\\)$", "", .y))
y
#> [[1]]
#> [1] "28316496" "28943784" "28579919"
#> 
#> [[2]]
#> [1] "29343898"

Created on 2022-09-24 with reprex v2.0.2

In base R , we can use gsub to remove the ( , followed by the digits and ) , and use read.table to read it in a data.frame

read.table(text = gsub("\\(\\d+\\)", "", df1$col1), 
    header = FALSE, sep = "|", fill = TRUE)
        V1       V2       V3
1 28316496 28943784 28579919
2 29343898       NA       NA

Or using str_extract , use a regex lookaround

library(stringr)
str_extract_all(df1$col1, "\\d+(?=\\()")
[[1]]
[1] "28316496" "28943784" "28579919"

[[2]]
[1] "29343898"

data

df1 <- structure(list(col1 = c("28316496(15)|28943784(8)|28579919(7)", 
"29343898(1)")), class = "data.frame", row.names = c(NA, -2L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM