简体   繁体   中英

Is it possible to exclude some of the string used to match from Ruby regexp data?

I have a bunch of strings that look, for example, like this:

<option value="Spain">Spain</option>

And I want to extract the name of the country from inside.

The easiest way I could think of to do this in Ruby was to use a regular expression of this form:

country  = line.match(/>(.+)</)

However, this returns >Spain< . So I did this:

line.match(/>(.+)</).to_s.gsub!(/<|>/,"")

Works well enough, but I'd be surprised if there's not a more elegant way to do this? It seems like using a regular expression to declare how to find the thing you want, without actually wanting the enclosing strings that were used to match it to be part of the data that gets returned.

Is there a conventional approach to this problem?

The right way to deal with that string is to use an HTML parser, for example:

country = Nokogiri::HTML('<option value="Spain">Spain</option>').at('option').text

And if you have several such strings, paste them together and use search :

html      = '<option value="Spain">Spain</option><option value="Canada">Canada</option>'
countries = Nokogiri::HTML(html).search('option').map(&:text)
# ["Spain", "Canada"]

But if you must use a regex, then:

country = '<option value="Spain">Spain</option>'.match('>([^<]+)<')[1]

Keep in mind that match actually returns a MatchData object and MatchData#to_s :

Returns the entire matched string.

But you can access the captured groups using MatchData#[] . And if you don't like counting, you could use a named capture group as well:

country = '<option value="Spain">Spain</option>'.match('>(?<name>[^<]+)<')['name']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM