I have
tmp_body_symbols="things <st>hello</st> and <st>blue</st> by <st>orange</st>"
str1_markerstring = "<st>"
str2_markerstring = "</st>"
frags << tmp_body_symbols[/#{str1_markerstring}(.*?)#{str2_markerstring}/m, 1]
frags is "hello" but I want ["hello","blue","orange"]
How woudl I do that?
Use scan
:
tmp_body_symbols.scan(/#{str1_markerstring}(.*?)#{str2_markerstring}/m).flatten
See also: Ruby docs for String#scan
.
You can use Nokogiri to parse HTML/XML
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML::Document.parse("things <st>hello</st> and <st>blue</st> by <st>orange</st>")
doc.css('st').map(&:text)
#=> ["hello", "blue", "orange"]
More Info : http://www.nokogiri.org/tutorials/parsing_an_html_xml_document.html
You can do this with a capture group, as @Doorknob has done, or without a capture group, by using a ("zero-width") positive look-behind and positive-lookahead:
tmp = "things <st>hello</st> and <st>blue</st> by <st>orange</st>"
s1 = "<st>"
s2 = "</st>"
tmp.scan(/(?<=#{ s1 }).*?(?=#{ s2 })/).flatten
#=> ["hello", "blue", "orange"]
(?<=#{ s1 })
, which evaluates to (?<=<st>)
, is the positive look-behind. (?=#{ s2 })
, which evaluates to (?=</st>)
, is the positive look-behind. ?
following .*
makes it "non-greedy". Without it: tmp.scan(/(?<=#{ s1 }).*(?=#{ s2 })/).flatten
#=> ["hello</st> and <st>blue</st> by <st>orange"]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.