简体   繁体   English

使用String#split方法

[英]Using String#split Method

By default, the #split method work as follows: 默认情况下, #split方法的工作方式如下:

"id,name,title(first_name,last_name)".split(",")

will give you following output: 会给你以下输出:

["id", "name", "title(first_name", "last_name)"]

But I want something like following: 但我想要以下内容:

["id", "name", "title(first_name,last_name)"]

So, I use following regex (from the this answer ) using split to get desired output: 所以,我使用以下正则表达式(来自这个答案 )使用split来获得所需的输出:

"id,name,title(first_name,last_name)".split(/,(?![^(]*\))/)

But, again when I use another string, which is my actual input above, the logic fails. 但是,当我使用另一个字符串时,这是我上面的实际输入,逻辑失败了。 My actual string is: 我的实际字符串是:

"id,name,title(first_name,last_name,address(street,pincode(id,code)))"

and it is giving following output: 它提供以下输出:

["id", "name", "title(first_name", "last_name", "address(street", "pincode(id,code)))"]

rather than 而不是

["id", "name", "title(first_name,last_name,address(street,pincode(id,code)))"]

Updated Answer 更新的答案

Since the earlier answer didn't take care of all the cases as rightly pointed out in the comments, I'm updating the answer with another solution. 由于早期的答案没有照顾评论中正确指出的所有案例,我正在用另一种解决方案更新答案。

This approach separates the valid commas using a separator | 此方法使用分隔符|分隔有效逗号 and, later uses it to split the string using String#split . 并且,稍后使用它来使用String#split字符串。

class TokenArrayParser
  SPLIT_CHAR = '|'.freeze

  def initialize(str)
    @str = str
  end

  def parse
    separate_on_valid_comma.split(SPLIT_CHAR)
  end

  private

  def separate_on_valid_comma
    dup = @str.dup
    paren_count = 0
    dup.length.times do |idx|
      case dup[idx]
      when '(' then  paren_count += 1
      when ')' then paren_count -= 1
      when ',' then dup[idx] = SPLIT_CHAR if paren_count.zero?
      end
    end

    dup
  end
end

%w(
  id,name,title(first_name,last_name)
  id,name,title(first_name,last_name,address(street,pincode(id,code)))
  first_name,last_name,address(street,pincode(id,code)),city(name)
  a,b(c(d),e,f)
  id,name,title(first_name,last_name),pub(name,address)
).each {|str| puts TokenArrayParser.new(str).parse.inspect }

# =>
# ["id", "name", "title(first_name,last_name)"]
# ["id", "name", "title(first_name,last_name,address(street,pincode(id,code)))"]
# ["first_name", "last_name", "address(street,pincode(id,code))", "city(name)"]
# ["a", "b(c(d),e,f)"]
# ["id", "name", "title(first_name,last_name)", "pub(name,address)"]

I'm sure this can be optimized more. 我相信这可以更优化。

def doit(str)
  split_here = 0.chr
  stack = 0
  s = str.gsub(/./) do |c|
    ret = c
    case c
    when '('
      stack += 1
    when ','
      ret = split_here, if stack.zero?
    when ')'
      raise(RuntimeError, "parens are unbalanced") if stack.zero?
      stack -= 1
    end
    ret
  end
  raise(RuntimeError, "parens are unbalanced, stack at end=#{stack}") if stack > 0
  s.split(split_here)
end

doit "id,name,title(first_name,last_name)"
  #=> ["id", "name", "title(first_name,last_name)"]
doit "id,name,title(first_name,last_name,address(street,pincode(id,code)))"
  #=> ["id", "name", "title(first_name,last_name,address(street,pincode(id,code)))"]
doit "a,b(c(d),e,f)"
  #=> ["a", "b(c(d),e,f)"]
doit "id,name,title(first_name,last_name),pub(name,address)"
  #=> ["id", "name", "title(first_name,last_name)", "pub(name,address​)"]
doit "a,b(c)d),e,f)"
  #=> RuntimeError: parens are unbalanced
doit "a,b(c(d),e),f("
  #=> RuntimeError: parens are unbalanced, stack at end=["("]

A comma is to be split upon if and only if stack is zero when it is encountered. 当且仅当遇到stack为零时,才会拆分逗号。 If it is to be split upon it is changed to a character ( split_here ) that is not in the string. 如果要将其拆分,则将其更改为不在字符串中的字符( split_here )。 (I used 0.chr ). (我用了0.chr )。 The string is then split on split_here . 然后在split_here拆分字符串。

This could be one approach: 这可能是一种方法:

"id,name,title(first_name,last_name)".split(",")[0..1] << "id,name,title(first_name,last_name)".split(",")[-2..-1].join

Creating a duplicate string and splitting them both, then combining the first two elements of the first string with the joined last two elements of the second string copy. 创建一个重复的字符串并将它们分开,然后将第一个字符串的前两个元素与第二个字符串副本的连接的最后两个元素组合在一起。 At least in this specific scenario it would give you the desired result. 至少在这种特定情况下,它会给你想要的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM