简体   繁体   中英

Split by multiple delimiters

I'm receiving a string that contains two numbers in a handful of different formats:

"344, 345" , "334,433" , "345x532" and "432 345"

I need to split them into two separate numbers in an array using split , and then convert them using Integer(num) .

What I've tried so far:

nums.split(/[\s+,x]/) # split on one or more spaces, a comma or x

However, it doesn't seem to match multiple spaces when testing. Also, it doesn't allow a space in the comma version shown above ( "344, 345" ).

How can I match multiple delimiters?

You are using a character class in your pattern, and it matches only one character. [\\s+,x] matches 1 whitespace, or a + , , or x . You meant to use (?:\\s+|x) .

However, perhaps, a mere \\D+ (1 or more non-digit characters) should suffice:

"345, 456".split(/\D+/).map(&:to_i)
R1 = Regexp.union([", ", ",", "x", " "])
  #=> /,\ |,|x|\ /
R2 = /\A\d+#{R1}\d+\z/
  #=> /\A\d+(?-mix:,\ |,|x|\ )\d+\z/

def split_it(s)
  return nil unless s =~ R2
  s.split(R1).map(&:to_i)
end

split_it("344, 345") #=> [344, 345] 
split_it("334,433")  #=> [334, 433] 
split_it("345x532")  #=> [345, 532] 
split_it("432 345")  #=> [432, 345] 
split_it("432&345")  #=> nil
split_it("x32 345")  #=> nil

Your original regex would work with a minor adjustment to move the '+' symbol outside the character class:

"344 ,x  345".split(/[\s,x]+/).map(&:to_i) #==> [344,345]

If the examples are actually the only formats that you'll encounter, this will work well. However, if you have to be more flexible and accommodate unknown separators between the numbers, you're better off with the answer given by Wiktor:

"344 ,x  345".split(/\D+/).map(&:to_i) #==> [344,345]

Both cases will return an array of Integers from the inputs given, however the second example is both more robust and easier to understand at a glance.

it doesn't seem to match multiple spaces when testing

Yeah, character class (square brackets) doesn't work like this. You apply quantifiers on the class itself, not on its characters. You could use | operator instead. Something like this:

.split(%r[\s+|,\s*|x])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM