简体   繁体   中英

gsub and remove all characters between < and > in R

I have a string:

a="<gml:posList srsDimension=\"2\" count=\"5\">7 -5.067 -3 56.7 -3.3 58.3 -5.65 57 -8.33</gml:posList>"

and want to gsub everything between the < and >, to now avail so far. I want to have only the numbers remaining (ie 7 -5 -3 56 -3 58...) where I can take every even/odd element to process.

I tried Remove all text between two brackets to no avail

    > gsub('<^|*>','',a[[1]],perl=TRUE)
Error in gsub("<^|*>", "", a[[1]], perl = TRUE) : 
  invalid regular expression '<^|*>'
In addition: Warning message:
In gsub("<^|*>", "", a[[1]], perl = TRUE) : PCRE pattern compilation error
    'nothing to repeat'
    at '*>'

and

gsub('<gml.+>\\d','',a[[1]])

which cuts removes the first digit

I am sure I am missing something obvious, as '<' is not a special character.

Here are some other tries (and fails)

> gsub('<.+>','',a[[1]])
[1] ""
> gsub('<.+>.+<.+>','',a[[1]])
[1] ""
> gsub('<gml.+>','',a[[1]])
[1] ""

You can use

 gsub("<[^>]+>", "",a)
[1] "7 -5.067 -3 56.7 -3.3 58.3 -5.65 57 -8.33"

"<" and ">" are literals, "[^>]" matches any character that is not ">" and "+" allows for one or more matches. Using gsub repeats this match as many times as this pattern is found. The pattern is replaced by the empty string "".

library(qdapRegex)
a="<gml:posList srsDimension=\"2\" count=\"5\">7 -5.067 -3 56.7 -3.3 58.3 -5.65 57 -8.33</gml:posList>"
rm_between(a, "<", ">", extract = T)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM