简体   繁体   中英

Ruby Regex, Only One Capture (Very Simple!)

I guess this will be a silly mistake but for me, the following returns an array containing only "M". See this:

/(.)+?/.match("Many many characters!").captures
=> ["M"]

Why doesn't it return an array of every character? I must have missed something blatantly obvious because I can't see whats wrong with this?

Edit: Just realised, I don't need the +? but it still doesn't work without it.

Edit: Apologies! I will clarify: my goal is to allow users to enter a regular expression and styling and an input text file, wherever there is a match, the text will be surrounded with a html element and styling will be applied, I am not just splitting the string into characters, I only used the given regex because it was the simplest although that was stupid on my part. How do I get capture groups from scan() or is that not possible? I see that $1 contains "!" (last match?) and not any others.

Edit: Gosh, it really isn't my day. As injekt has informed me, the captures are stored in separate arrays. How do I get the offset of these captures from the original string? I would like to be able to get the offset of a captures then surround it with another string. Or is that what gsub is for? (I thought that only replaced the match, not a capture group)

Hopefully final edit: Right, let me just start this again :P

So, I have a string. The user will use a configuration file to enter a regular expression, then a style associated with each capture group. I need to be able to scan the entire string and get the start and finish or offset and size of each group match.

So if a user had configured ([\\w-\\.]+)@((?:[\\w]+\\.)+)([a-zA-Z]{2,4}) (email address) then I should be able to get:

[ ["elliotpotts", 0,  11],
  ["sample.",     12, 7],
  ["com",         19, 3] ]

from the string: "elliotpotts@sample.com"

If that is not clear, there is simply something wrong with me :P. Thanks a lot so far guys, and thank you for being so patient!

Because your capture is only matching one single character. (.)+ is not the same as (.+)

>> /(.)+?/.match("Many many characters!").captures
=> ["M"]
>> /(.+)?/.match("Many many characters!").captures
=> ["Many many characters!"]
>> /(.+?)/.match("Many many characters!").captures
=> ["M"]

If you want to match every character recursively use String#scan or String#split if you don't care about capture groups

Using scan:

"Many many characters!".scan(/./)
#=> ["M", "a", "n", "y", " ", "m", "a", "n", "y", " ", "c", "h", "a", "r", "a", "c", "t", "e", "r", "s", "!"]

Note that other answer are using (.) whilst that's fine if you care about the capture group, it's a little pointless if you don't, otherwise it'll return EVERY CHARACTER in it's own separate Array, like this:

[["M"], ["a"], ["n"], ["y"], [" "], ["m"], ["a"], ["n"], ["y"], [" "], ["c"], ["h"], ["a"], ["r"], ["a"], ["c"], ["t"], ["e"], ["r"], ["s"], ["!"]]

Otherwise, just use split : "Many many characters!".split(' ')"

EDIT In reply to your edit:

reg = /([\w-\.]+)@((?:[\w]+\.)+)([a-zA-Z]{2,4})/
str = "elliotpotts@sample.com"
str.scan(reg).flatten.map { |capture| [capture, str.index(capture), capture.size] }
#=> [["elliotpotts", 0, 11], ["sample.", 12, 7], ["com", 19, 3]]`

Oh, and you don't need scan, you're not really scanning so you dont need to traverse, at least not with the example you provided:

str.match(reg).captures.map { |capture| [capture, str.index(capture), capture.size] }

Will also work

Yes, something important was missed ;-)

(...) only introduces ONE capture group: the number of times the group matches is irrelevant as the index is determined only by the regular expression itself and not the input.

The key is a "global regular expression", which will apply the regular expression multiple times in order. In Ruby this is done with inverting from Regex#match to String#scan (many other languages have a "/g" regular expression modifier):

"Many many chara­cters!".sc­an(/(.)+?/­)
# but more simply (or see answers using String#split)
"Many many chara­cters!".sc­an(/(.)/­)

Happy coding

It's only returning one character because that's all you've asked it to match. You probably want to use scan instead:

str = "Many many characters!"
matches = str.scan(/(.)/)

The following code is from Get index of string scan results in ruby and modified for my liking.

[].tap {|results|
    "abab".scan(/a/) {|capture|
        results.push(([capture, Regexp::last_match.offset(0)]).flatten)
    }
}

=> [["a", 0], ["a", 2]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM