Trying to to match the hash character fails, but succeeds for any other member of the regex.
Why does this fail?
Thanks,
Joe
UNIT = [ 'floor', 'fl', '#', 'penthouse', 'mezzanine', 'basement', 'room' ]
unit_regex = "\\b(" + UNIT.to_a.join("|") + ")\\b"
unit_regexp = Regexp.new(unit_regex, Regexp::IGNORECASE)
x=unit_regexp.match('#')
As noted in the comments, your problem is that \\b
is a word boundary inside a regex (unless it is inside a character class, sigh, the \\b
in /[\\b]/
is a backspace just like in a double quoted string). A word boundary is roughly
a word character on one side and nothing or a non-word character on the other side
But #
is not a word character so /\\b/
can't match '#'
at all and your whole regex fails to match.
You're going to have to be more explicit about what you're trying to match. A first stab would be "the beginning of the string or whitespace" instead of the first \\b
and "the end of the string or whitespace" instead of the second \\b
. That could be expressed like this:
unit_regex = '(?<=\A|\s)(' + UNIT.to_a.join('|') + ')(?=\z|\s)'
Note that I've switched to single quotes to avoid all the double escaping hassles. The ?<=
is a positive lookbehind , that means that (\\A|\\s)
needs to be there but it won't be matched by the expression; similarly, ?=
is a positive lookahead . See the manual for more details. Also note that we're using \\A
rather than ^
since ^
matches the beginning of a line not the string ; similarly, \\z
instead of $
because \\z
matches the end of the string whereas $
matches the end of a line .
You may need to tweak the regex depending on your data but hopefully that will get you started.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.