简体   繁体   English

从字符串中删除不匹配的括号

[英]Remove unmatched parentheses from a string

I want to remove "un-partnered" parentheses from a string. 我想从字符串中删除“un-partnered”括号。

Ie, all ( 's should be removed unless they're followed by a ) somewhere in the string. 也就是说,在字符串中的某个地方(除非它们后面跟着一个) ,所有(除非它们被删除) Likewise, all ) 's not preceded by a ( somewhere in the string should be removed. 同样地,所有)之前都没有(字符串中的某个地方应该被移除)。

Ideally the algorithm would take into account nesting as well. 理想情况下,算法也会考虑嵌套。

Eg: 例如:

"(a)".remove_unmatched_parents # => "(a)"
"a(".remove_unmatched_parents # => "a"
")a(".remove_unmatched_parents # => "a"

Instead of a regex, consider a push-down automata, perhaps. 而不是正则表达式,或许考虑下推自动机。 (I'm not sure if Ruby regular expressions can handle this, I believe Perl's can). (我不确定Ruby正则表达式是否可以处理这个,我相信Perl可以)。

A (very trivialized) process may be: 一个(非常简单的)过程可能是:

For each character in the input string: 对于输入字符串中的每个字符:

  1. If it is not a '(' or ')' then just append it to the output 如果它不是'('或')',那么只需将它附加到输出
  2. If it is a '(' increase a seen_parens counter and add it 如果它是'('增加一个seen_parens计数器并添加它
  3. If it is a ')' and seen_parens is > 0, add it and decrease seen_parens. 如果它是')' 并且 seen_parens> 0,则添加它并减少seen_parens。 Otherwise skip it. 否则跳过它。

At the end of the process, if seen_parens is > 0 then remove that many parens, starting from the end. 在该过程结束时,如果seen_parens> 0,则从末尾开始删除那么多parens。 (This step can be merged into the above process with use of a stack or recursion.) (此步骤可以使用堆栈或递归合并到上面的过程中。)

The entire process is O(n) , even if a relatively high overhead 整个过程是O(n) ,即使开销相对较高

Happy coding. 快乐的编码。

The following uses oniguruma. 以下使用oniguruma。 Oniguruma is the regex engine built in if you are using ruby1.9. 如果您使用ruby1.9,Oniguruma是内置的正则表达式引擎。 If you are using ruby1.8, see this: oniguruma . 如果您使用的是ruby1.8,请参阅: oniguruma

Update 更新

I had been so lazy to just copy and paste someone else's regex. 我一直懒得只是复制并粘贴别人的正则表达式。 It seemed to have problem. 它似乎有问题。

So now, I wrote my own. 所以现在,我写了自己的。 I believe it should work now. 我相信它现在应该有效。

class String
    NonParenChar = /[^\(\)]/
    def remove_unmatched_parens
        self[/
            (?:
                (?<balanced>
                    \(
                        (?:\g<balanced>|#{NonParenChar})*
                    \)
                )
                |#{NonParenChar}
            )+
        /x]
    end
end
  • (?<name>regex1) names the (sub)regex regex1 as name , and makes it possible to be called. (?<name>regex1)将(sub)regex regex1name ,并使其可以被调用。
  • ?g<name> will be a subregex that represents regex1 . ?g<name>将是表示regex1 Note here that ?g<name> does not represent a particular string that matched regex1 , but it represents regex1 itself. 请注意, ?g<name>不表示与regex1匹配的特定字符串,但它表示regex1本身。 In fact, it is possible to embed ?g<name> within (?<name>...) . 实际上,可以在(?<name>...)嵌入?g<name>

Update 2 更新2

This is simpler. 这更简单。

class String
    def remove_unmatched_parens
        self[/
            (?<valid>
                \(\g<valid>*\)
                |[^()]
            )+
        /x]
    end
end

Build a simple LR parser: 构建一个简单的LR解析器:

tokenize, token, stack = false, "", []

")(a))(()(asdf)(".each_char do |c|
  case c
  when '('
    tokenize = true
    token = c
  when ')'
    if tokenize
      token << c 
      stack << token
    end
    tokenize = false
  when /\w/
    token << c if tokenize
  end
end

result = stack.join

puts result

running yields: 运行收益率:

wesbailey@feynman:~/code_katas> ruby test.rb
(a)()(asdf)

I don't agree with the folks modifying the String class because you should never open a standard class. 我不同意修改String类的人,因为你永远不应该打开标准类。 Regexs are pretty brittle for parser and hard to support. 正则表达式对于解析器而言非常脆弱且难以支持。 I couldn't imagine coming back to the previous solutions 6 months for now and trying to remember what they were doing! 我无法想象现在回到以前的解决方案6个月,并试图记住他们在做什么!

Here's my solution, based on @pst's algorithm: 这是我的解决方案,基于@ pst的算法:

class String
  def remove_unmatched_parens
    scanner = StringScanner.new(dup)
    output = ''
    paren_depth = 0

    while char = scanner.get_byte
      if char == "("
        paren_depth += 1
        output << char
      elsif char == ")"
        output << char and paren_depth -= 1 if paren_depth > 0
      else
        output << char
      end
    end

    paren_depth.times{ output.reverse!.sub!('(', '').reverse! }
    output
  end
end

Algorithm: 算法:

  1. Traverse through the given string. 遍历给定的字符串。
  2. While doing that, keep track of "(" positions in a stack. 在这样做的同时,跟踪“(”堆栈中的位置)。
  3. If any ")" found, remove the top element from the stack. 如果找到任何“)”,则从堆栈中删除顶部元素。
    • If stack is empty, remove the ")" from the string. 如果stack为空,则从字符串中删除“)”。
  4. In the end, we can have positions of unmatched braces, if any. 最后,如果有的话,我们可以拥有无​​与伦比的支撑位置。

Java code: Present @ http://a2ajp.blogspot.in/2014/10/remove-unmatched-parenthesis-from-given.html Java代码:现在@ http://a2ajp.blogspot.in/2014/10/remove-unmatched-parenthesis-from-given.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM