I have a vector of character ids, as rownames of a data frame in R. The rownames have the following pattern:
head(foo)
[1] "ENSG00000197372 (ZNF675)" "ENSG00000112624 (GLTSCR1L)"
[3] "ENSG00000151320 (AKAP6)" "ENSG00000139910 (NOVA1)"
[5] "ENSG00000137449 (CPEB2)" "ENSG00000004779 (NDUFAB1)"
I would like to somehow subset the above rownames (~700 entries) in order to keep only the gene symbols in the parenthesis part-ie ZNF675-and drop the rest part: is this possible through a function like gsub ?
We can use sub
to match characters that are not (
, then capture the characters inside the (
which is not a )
and replace it with the backreference ( \\\\1
) of the captured group
row.names(foo) <- sub("^[^(]+\\(([^)]+).*", "\\1", row.names(foo))
row.names(foo)
#[1] "ZNF675" "GLTSCR1L" "AKAP6" "NOVA1" "CPEB2" "NDUFAB1"
Or using str_extract
from stringr
library(stringr)
str_extract(row.names(foo), "(?<=\\()[^)]+")
foo <- data.frame(col1 = rnorm(6))
row.names(foo) <- c("ENSG00000197372 (ZNF675)",
"ENSG00000112624 (GLTSCR1L)", "ENSG00000151320 (AKAP6)",
"ENSG00000139910 (NOVA1)",
"ENSG00000137449 (CPEB2)", "ENSG00000004779 (NDUFAB1)")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.