简体   繁体   中英

Ruby Regex: empty space at beginning and end of string

I want to find all users with a first name that has an empty space at the beginning or ending. It could look like: "Juliette " or " Juliette" For now I only have the regex to match when the space is at the end of string: ^[ab]:[[:space:]]|$ I didn't find how to match the empty space at the beginning of the string and I don't know if it's possible to accomplish both of these conditions in one regex? Thanks for your help.

Test for Strippable Whitespace without Regexp

There's a little trick you can use with String#strip! , which returns nil if it can't find whitespace to strip. For example:

# return true if str has leading/trailing whitespace;
# otherwise returns false
def strippable? str
  { str => !!str.dup.strip! }
end

# leading space, trailing space, no space
test_values = [ ' foo', 'foo ', 'foo' ]

test_values.map { |str| strippable? str }
#=> [{" foo"=>true}, {"foo "=>true}, {"foo"=>false}]

This doesn't rely on a regular expression, but rather on properties of the String and the Boolean result of an inverted #strip., Regardless of whether the Ruby engine uses regular expressions under the hood, these types of String methods are often faster than comparable Regexp matches. but your mileage and specific use cases may vary.

Alternatives with Regexp

Using the same test data as above, you could do something similar with a regular expression. For example:

# leading space, trailing space, no space
test_values = [ ' foo', 'foo ', 'foo' ]

# test start/end of string
test_values = [ ' foo', 'foo ', 'foo' ].grep /\A\s+|\s+\z/
#=> [" foo", "foo "]

# test start/end of line
test_values = [ ' foo', 'foo ', 'foo' ].grep /^\s+|\s+$/
#=> [" foo", "foo "]

Benchmarks

require 'benchmark'

ITERATIONS  = 1_000_000
TEST_VALUES = [ ' foo', 'foo ', 'foo' ]

def regex_grep array
  array.grep /^\s+|\s+$/
end

def string_strip array
  array.map { |str| { str => !!str.dup.strip! } }
end

Benchmark.bmbm do |x|
  n = ITERATIONS
  x.report('regex') { n.times { regexp_grep  TEST_VALUES } }
  x.report('strip') { n.times { string_strip TEST_VALUES } }
end
 user system total real regex 1.539269 0.001325 1.540594 ( 1.541438) strip 1.256836 0.001357 1.258193 ( 1.259955)

A quarter second over a million iterations may not seem like a big difference, but on significantly larger data sets or iterations it can add up. Whether or not it's enough for you to care for this particular use case is up to you, but the general pattern is that native String methods (regardless of how they're implemented by the interpreter under the hood) are generally faster than regular expression pattern matching. Of course there are edge cases, but that's what benchmarks are for!

You can use

/\A([a-zA-Z]+ | [a-zA-Z]+)\z/
/\A(?:[[:alpha:]]+[[:space:]]|[[:space:]][[:alpha:]]+)\z/
/\A(?:\p{L}+[\p{Z}\t]|[\p{Z}\t]\p{L}+)\z/

See the Rubular demo (with line anchors instead of string anchors used for the demo purposes)

Details :

  • \A - a string start anchor
  • (...) - a capturing group
  • (?:...) - a non-capturing group (it is preferred here since you are not extracting, just validating)
  • [a-zA-Z]+ - any one or more ASCII letters
  • \p{L}+ - any one or more Unicode letters
  • | - or
  • \z - end of string anchor.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM