[英]Remove unmatched parentheses from a string
I want to remove "un-partnered" parentheses from a string. 我想从字符串中删除“un-partnered”括号。
Ie, all (
's should be removed unless they're followed by a )
somewhere in the string. 也就是说,在字符串中的某个地方(除非它们后面跟着一个
)
,所有(
除非它们被删除)
。 Likewise, all )
's not preceded by a (
somewhere in the string should be removed. 同样地,所有
)
之前都没有(
字符串中的某个地方应该被移除)。
Ideally the algorithm would take into account nesting as well. 理想情况下,算法也会考虑嵌套。
Eg: 例如:
"(a)".remove_unmatched_parents # => "(a)"
"a(".remove_unmatched_parents # => "a"
")a(".remove_unmatched_parents # => "a"
Instead of a regex, consider a push-down automata, perhaps. 而不是正则表达式,或许考虑下推自动机。 (I'm not sure if Ruby regular expressions can handle this, I believe Perl's can).
(我不确定Ruby正则表达式是否可以处理这个,我相信Perl可以)。
A (very trivialized) process may be: 一个(非常简单的)过程可能是:
For each character in the input string: 对于输入字符串中的每个字符:
At the end of the process, if seen_parens is > 0 then remove that many parens, starting from the end. 在该过程结束时,如果seen_parens> 0,则从末尾开始删除那么多parens。 (This step can be merged into the above process with use of a stack or recursion.)
(此步骤可以使用堆栈或递归合并到上面的过程中。)
The entire process is O(n)
, even if a relatively high overhead 整个过程是
O(n)
,即使开销相对较高
Happy coding. 快乐的编码。
The following uses oniguruma. 以下使用oniguruma。 Oniguruma is the regex engine built in if you are using ruby1.9.
如果您使用ruby1.9,Oniguruma是内置的正则表达式引擎。 If you are using ruby1.8, see this: oniguruma .
如果您使用的是ruby1.8,请参阅: oniguruma 。
Update 更新
I had been so lazy to just copy and paste someone else's regex. 我一直懒得只是复制并粘贴别人的正则表达式。 It seemed to have problem.
它似乎有问题。
So now, I wrote my own. 所以现在,我写了自己的。 I believe it should work now.
我相信它现在应该有效。
class String
NonParenChar = /[^\(\)]/
def remove_unmatched_parens
self[/
(?:
(?<balanced>
\(
(?:\g<balanced>|#{NonParenChar})*
\)
)
|#{NonParenChar}
)+
/x]
end
end
(?<name>regex1)
names the (sub)regex regex1
as name
, and makes it possible to be called. (?<name>regex1)
将(sub)regex regex1
为name
,并使其可以被调用。 ?g<name>
will be a subregex that represents regex1
. ?g<name>
将是表示regex1
。 Note here that ?g<name>
does not represent a particular string that matched regex1
, but it represents regex1
itself. ?g<name>
不表示与regex1
匹配的特定字符串,但它表示regex1
本身。 In fact, it is possible to embed ?g<name>
within (?<name>...)
. (?<name>...)
嵌入?g<name>
。 Update 2 更新2
This is simpler. 这更简单。
class String
def remove_unmatched_parens
self[/
(?<valid>
\(\g<valid>*\)
|[^()]
)+
/x]
end
end
Build a simple LR parser: 构建一个简单的LR解析器:
tokenize, token, stack = false, "", []
")(a))(()(asdf)(".each_char do |c|
case c
when '('
tokenize = true
token = c
when ')'
if tokenize
token << c
stack << token
end
tokenize = false
when /\w/
token << c if tokenize
end
end
result = stack.join
puts result
running yields: 运行收益率:
wesbailey@feynman:~/code_katas> ruby test.rb
(a)()(asdf)
I don't agree with the folks modifying the String class because you should never open a standard class. 我不同意修改String类的人,因为你永远不应该打开标准类。 Regexs are pretty brittle for parser and hard to support.
正则表达式对于解析器而言非常脆弱且难以支持。 I couldn't imagine coming back to the previous solutions 6 months for now and trying to remember what they were doing!
我无法想象现在回到以前的解决方案6个月,并试图记住他们在做什么!
Here's my solution, based on @pst's algorithm: 这是我的解决方案,基于@ pst的算法:
class String
def remove_unmatched_parens
scanner = StringScanner.new(dup)
output = ''
paren_depth = 0
while char = scanner.get_byte
if char == "("
paren_depth += 1
output << char
elsif char == ")"
output << char and paren_depth -= 1 if paren_depth > 0
else
output << char
end
end
paren_depth.times{ output.reverse!.sub!('(', '').reverse! }
output
end
end
Algorithm: 算法:
Java code: Present @ http://a2ajp.blogspot.in/2014/10/remove-unmatched-parenthesis-from-given.html Java代码:现在@ http://a2ajp.blogspot.in/2014/10/remove-unmatched-parenthesis-from-given.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.