从字符串中删除不匹配的括号

Question

I want to remove "un-partnered" parentheses from a string. 我想从字符串中删除“un-partnered”括号。

Ie, all ( 's should be removed unless they're followed by a ) somewhere in the string. 也就是说，在字符串中的某个地方（除非它们后面跟着一个) ，所有(除非它们被删除) 。 Likewise, all ) 's not preceded by a ( somewhere in the string should be removed. 同样地，所有)之前都没有(字符串中的某个地方应该被移除）。

Ideally the algorithm would take into account nesting as well. 理想情况下，算法也会考虑嵌套。

Eg: 例如：

"(a)".remove_unmatched_parents # => "(a)"
"a(".remove_unmatched_parents # => "a"
")a(".remove_unmatched_parents # => "a"

Answer 1

Instead of a regex, consider a push-down automata, perhaps. 而不是正则表达式，或许考虑下推自动机。 (I'm not sure if Ruby regular expressions can handle this, I believe Perl's can). （我不确定Ruby正则表达式是否可以处理这个，我相信Perl可以）。

A (very trivialized) process may be: 一个（非常简单的）过程可能是：

For each character in the input string: 对于输入字符串中的每个字符：

If it is not a '(' or ')' then just append it to the output 如果它不是'（'或'）'，那么只需将它附加到输出
If it is a '(' increase a seen_parens counter and add it 如果它是'（'增加一个seen_parens计数器并添加它
If it is a ')' and seen_parens is > 0, add it and decrease seen_parens. 如果它是'）' 并且 seen_parens> 0，则添加它并减少seen_parens。 Otherwise skip it. 否则跳过它。

At the end of the process, if seen_parens is > 0 then remove that many parens, starting from the end. 在该过程结束时，如果seen_parens> 0，则从末尾开始删除那么多parens。 (This step can be merged into the above process with use of a stack or recursion.) （此步骤可以使用堆栈或递归合并到上面的过程中。）

The entire process is O(n) , even if a relatively high overhead 整个过程是O(n) ，即使开销相对较高

Happy coding. 快乐的编码。

Answer 2

The following uses oniguruma. 以下使用oniguruma。 Oniguruma is the regex engine built in if you are using ruby1.9. 如果您使用ruby1.9，Oniguruma是内置的正则表达式引擎。 If you are using ruby1.8, see this: oniguruma . 如果您使用的是ruby1.8，请参阅： oniguruma 。

Update 更新

I had been so lazy to just copy and paste someone else's regex. 我一直懒得只是复制并粘贴别人的正则表达式。 It seemed to have problem. 它似乎有问题。

So now, I wrote my own. 所以现在，我写了自己的。 I believe it should work now. 我相信它现在应该有效。

class String
    NonParenChar = /[^\(\)]/
    def remove_unmatched_parens
        self[/
            (?:
                (?<balanced>
                    \(
                        (?:\g<balanced>|#{NonParenChar})*
                    \)
                )
                |#{NonParenChar}
            )+
        /x]
    end
end

(?<name>regex1) names the (sub)regex regex1 as name , and makes it possible to be called. (?<name>regex1)将（sub）regex regex1为name ，并使其可以被调用。
?g<name> will be a subregex that represents regex1 . ?g<name>将是表示regex1 。 Note here that ?g<name> does not represent a particular string that matched regex1 , but it represents regex1 itself. 请注意， ?g<name>不表示与regex1匹配的特定字符串，但它表示regex1本身。 In fact, it is possible to embed ?g<name> within (?<name>...) . 实际上，可以在(?<name>...)嵌入?g<name> 。

Update 2 更新2

This is simpler. 这更简单。

class String
    def remove_unmatched_parens
        self[/
            (?<valid>
                \(\g<valid>*\)
                |[^()]
            )+
        /x]
    end
end

Answer 3

Build a simple LR parser: 构建一个简单的LR解析器：

tokenize, token, stack = false, "", []

")(a))(()(asdf)(".each_char do |c|
  case c
  when '('
    tokenize = true
    token = c
  when ')'
    if tokenize
      token << c 
      stack << token
    end
    tokenize = false
  when /\w/
    token << c if tokenize
  end
end

result = stack.join

puts result

running yields: 运行收益率：

wesbailey@feynman:~/code_katas> ruby test.rb
(a)()(asdf)

I don't agree with the folks modifying the String class because you should never open a standard class. 我不同意修改String类的人，因为你永远不应该打开标准类。 Regexs are pretty brittle for parser and hard to support. 正则表达式对于解析器而言非常脆弱且难以支持。 I couldn't imagine coming back to the previous solutions 6 months for now and trying to remember what they were doing! 我无法想象现在回到以前的解决方案6个月，并试图记住他们在做什么！

Answer 4

Here's my solution, based on @pst's algorithm: 这是我的解决方案，基于@ pst的算法：

class String
  def remove_unmatched_parens
    scanner = StringScanner.new(dup)
    output = ''
    paren_depth = 0

    while char = scanner.get_byte
      if char == "("
        paren_depth += 1
        output << char
      elsif char == ")"
        output << char and paren_depth -= 1 if paren_depth > 0
      else
        output << char
      end
    end

    paren_depth.times{ output.reverse!.sub!('(', '').reverse! }
    output
  end
end

Answer 5

Algorithm: 算法：

Traverse through the given string. 遍历给定的字符串。
While doing that, keep track of "(" positions in a stack. 在这样做的同时，跟踪“（”堆栈中的位置）。
If any ")" found, remove the top element from the stack. 如果找到任何“）”，则从堆栈中删除顶部元素。
- If stack is empty, remove the ")" from the string. 如果stack为空，则从字符串中删除“）”。
In the end, we can have positions of unmatched braces, if any. 最后，如果有的话，我们可以拥有无与伦比的支撑位置。

Java code: Present @ http://a2ajp.blogspot.in/2014/10/remove-unmatched-parenthesis-from-given.html Java代码：现在@ http://a2ajp.blogspot.in/2014/10/remove-unmatched-parenthesis-from-given.html

从字符串中删除不匹配的括号

问题描述

5 个解决方案

解决方案1
7

解决方案2
3 2011-03-24 20:27:22

解决方案3
2 2011-03-26 03:21:59

解决方案4
1 2011-03-26 01:59:38

解决方案5
0 2014-10-13 07:38:34

从字符串中删除不匹配的括号

问题描述

5 个解决方案

解决方案1 7

解决方案2 3 2011-03-24 20:27:22

解决方案3 2 2011-03-26 03:21:59

解决方案4 1 2011-03-26 01:59:38

解决方案5 0 2014-10-13 07:38:34

解决方案1
7

解决方案2
3 2011-03-24 20:27:22

解决方案3
2 2011-03-26 03:21:59

解决方案4
1 2011-03-26 01:59:38

解决方案5
0 2014-10-13 07:38:34