简体   繁体   中英

Substring content between quotation marks

In a DF I have column entries of different length as the following:

tmp_ezg.\\"dr_HE_10691\\" , tmp_ezg.\\"dr_MV_0110200016\\" , tmp_ezg.\\"dr_MV_0111290017\\" etc.

How can I best substring what's in between the quotation marks?

My idea:

substring(DF$name, 10)

Since the content of the quotation marks has different lengths I cannot provide substring() a value where to stop.

Is there a possibility to substring only between certain symbols (ie quotation marks)?

For example

x <- c('tmp_ezg.\"dr_HE_10691\"' , 
       'tmp_ezg.\"dr_MV_0110200016\"' , 
       'tmp_ezg.\"dr_MV_0111290017\"')
res <- sub('.*?"([^"]+)"', "\\1", x)
print(res, quote=F)
# [1] dr_HE_10691     
# [2] dr_MV_0110200016
# [3] dr_MV_0111290017

... if I'm not mistaken.

To separate the content between the quotation marks (assuming there are exactly two in each entry), you just split the string by \\\\\\" (escaped backslash and quotation mark):

y <- strsplit(x, split = "\\\"")

If all entries end with a quotation mark, this will give you a list of entries with two values, and the second value in each entry is your string.

[[1]]
[1] "tmp_ezg."         "dr_HE_10691"
[[2]]
[1] "tmp_ezg."         "dr_MV_0110200016"
[[3]]
[1] "tmp_ezg."         "dr_MV_0111290017"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM