split large regular expression in different lines

Question

I have this regular expression:

INVALID_NAMES = /\b(bib$|costumes$|httpanties?|necklace|cuff link|cufflink|scarf|pendant|apron|buckle|beanie|hat|ring|blanket|polo|earrings?|plush|pacifier|tie$|panties|boxers?|slippers?|pants?|leggings|ibattz|dress|bodysuits?|charm|battstation|tea|pocket ref|pajamas?|boyshorts?|mimopowertube|coat|bathrobe)\b/i

and it's working in that way.... but I want to write something like this:

INVALID_NAMES = /\b(bib$|costumes$|httpanties?|necklace|cuff link|
                    cufflink|scarf|pendant|apron|buckle|beanie|hat|ring|blanket|
                    polo|earrings?|plush|pacifier|tie$|panties|boxers?|
                    slippers?|pants?|leggings|ibattz|dress|bodysuits?|charm|
                    battstation|tea|pocket ref|pajamas?|boyshorts?|
                    mimopowertube|coat|bathrobe)\b/i

but if I use the second option the words: cufflink, polo, slippers?, battstation and mimopowertube.... are not taken because the spaces that the word have before, example:

(this space before the word)cufflink

I'll be very grateful of any help.

Answer 1

You may use something like this

INVALID_NAMES = [
  "bib$",
  "costumes$",
  "httpanties?",
  "necklace"
]
INVALID_NAMES_REGEX = /\b(#{INVALID_NAMES.join '|'})\b/i
p INVALID_NAMES_REGEX

Answer 2

Construct Your Regex with the Space-Insensitive Flag

You can use the space-insensitive flag to ignore whitespace and comments in your regular expression. Note that you will need to use \\s or other explicit characters to catch whitespace once you enable this flag, since the /x flag would otherwise cause the spaces to be ignored.

Consider the following example:

INVALID_NAMES =
    /\b(bib$          |
        costumes$     |
        httpanties?   |
        necklace      |
        cuff\slink    |
        cufflink      |
        scarf         |
        pendant       |
        apron         |
        buckle        |
        beanie        |
        hat           |
        ring          |
        blanket       |
        polo          |
        earrings?     |
        plush         |
        pacifier      |
        tie$          |
        panties       |
        boxers?       |
        slippers?     |
        pants?        |
        leggings      |
        ibattz        |
        dress         |
        bodysuits?    |
        charm         |
        battstation   |
        tea           |
        pocket\sref   |
        pajamas?      |
        boyshorts?    |
        mimopowertube |
        coat          |
        bathrobe
    )\b/ix

Note that you can format it in many other ways, but having one expression per line makes it easier to sort and edit your sub-expressions. If you want it to have multiple alternatives per line, you could certainly do that.

Making Sure It Works

You can see that the expression above works as intended with the following examples:

'cufflink'.match INVALID_NAMES
#=> #<MatchData "cufflink" 1:"cufflink">

'cuff link'.match INVALID_NAMES
#=> #<MatchData "cuff link" 1:"cuff link">

Answer 3

When you add a newline in the middle of a regex literal, the newline becomes a part of the regular expression. Look at this example:

"ab" =~ /ab/ # => 0

"ab" =~ /a
b/ # => nil

"a\nb" =~ /a
b/ # => 0

You can suppress the newline by appending a backslash at the end of the line:

"ab" =~ /a\
b/ # => 0

Applied to your regex (leading spaces also removed):

INVALID_NAMES = /\b(bib$|costumes$|httpanties?|necklace|cuff link|\
cufflink|scarf|pendant|apron|buckle|beanie|hat|ring|blanket|\
polo|earrings?|plush|pacifier|tie$|panties|boxers?|\
slippers?|pants?|leggings|ibattz|dress|bodysuits?|charm|\
battstation|tea|pocket ref|pajamas?|boyshorts?|\
mimopowertube|coat|bathrobe)\b/i

Answer 4

You might do it like this:

INVALID_NAMES = ['necklace', 'cuff link', 'cufflink', 'scarf', 'tie?', 'bib$']
r = Regexp.union(INVALID_NAMES.map { |n| /\b#{n}\b/i })

str = 'cat \n  cufflink bib cuff link. tie Scarf\n cow necklace? \n  ti. bib'
str.scan(r)
  #=> ["cufflink", "cuff link", "tie", "Scarf", "necklace", "ti", "bib"]

Answer 5

Your patterns are inefficient and will cause the Regexp engine to thrash badly.

I'd recommend you investigate what Perl's Regexp::Assemble can do to help your Ruby code:

split large regular expression in different lines

Question

5 answers

solution1
3 2014-12-07 23:15:09

solution2
2 ACCPTED 2014-12-08 00:36:37

Construct Your Regex with the Space-Insensitive Flag

Making Sure It Works

solution3
1 2014-12-07 23:14:09

solution4
0 2014-12-07 23:20:46

solution5
0 2014-12-08 05:26:00

split large regular expression in different lines

Question

5 answers

solution1 3 2014-12-07 23:15:09

solution2 2 ACCPTED 2014-12-08 00:36:37

Construct Your Regex with the Space-Insensitive Flag

Making Sure It Works

solution3 1 2014-12-07 23:14:09

solution4 0 2014-12-07 23:20:46

solution5 0 2014-12-08 05:26:00

solution1
3 2014-12-07 23:15:09

solution2
2 ACCPTED 2014-12-08 00:36:37

solution3
1 2014-12-07 23:14:09

solution4
0 2014-12-07 23:20:46

solution5
0 2014-12-08 05:26:00