Is it possible to exclude some of the string used to match from Ruby regexp data?

Question

I have a bunch of strings that look, for example, like this:

<option value="Spain">Spain</option>

And I want to extract the name of the country from inside.

The easiest way I could think of to do this in Ruby was to use a regular expression of this form:

country  = line.match(/>(.+)</)

However, this returns >Spain< . So I did this:

line.match(/>(.+)</).to_s.gsub!(/<|>/,"")

Works well enough, but I'd be surprised if there's not a more elegant way to do this? It seems like using a regular expression to declare how to find the thing you want, without actually wanting the enclosing strings that were used to match it to be part of the data that gets returned.

Is there a conventional approach to this problem?

Answer 1

The right way to deal with that string is to use an HTML parser, for example:

country = Nokogiri::HTML('<option value="Spain">Spain</option>').at('option').text

And if you have several such strings, paste them together and use search :

html      = '<option value="Spain">Spain</option><option value="Canada">Canada</option>'
countries = Nokogiri::HTML(html).search('option').map(&:text)
# ["Spain", "Canada"]

But if you must use a regex, then:

country = '<option value="Spain">Spain</option>'.match('>([^<]+)<')[1]

Keep in mind that match actually returns a MatchData object and MatchData#to_s :

Returns the entire matched string.

But you can access the captured groups using MatchData#[] . And if you don't like counting, you could use a named capture group as well:

country = '<option value="Spain">Spain</option>'.match('>(?<name>[^<]+)<')['name']

Is it possible to exclude some of the string used to match from Ruby regexp data?

Question

1 answers

solution1
5 ACCPTED 2011-09-06 03:26:17

Is it possible to exclude some of the string used to match from Ruby regexp data?

Question

1 answers

solution1 5 ACCPTED 2011-09-06 03:26:17

solution1
5 ACCPTED 2011-09-06 03:26:17