简体   繁体   English

删除两个字符之间的所有内容,只要它们不在某些其他字符内

[英]Remove Everything Between Two Characters As Long As They Aren't Inside Some Other Characters

Basically, my goal is to remove everything inside ()'s except for strings that are inside "".基本上,我的目标是删除 () 中的所有内容,除了 "" 内的字符串。

I was following the code here: Remove text in-between delimiters in a string (using a regex?)我在这里关注代码: Remove text in-between delimiters in a string (using a regex?)

And that works great;这很好用; but I have the additional requirement of not removing ()s if they are in "".但是如果它们在“”中,我还有不删除 () 的附加要求。 Is that something that can be done with a regular expression.这是可以用正则表达式完成的事情吗? I feel like I'm dangerously close to needing another approach like a true parser.我觉得我非常接近需要另一种方法,比如真正的解析器。

This is the what I've been using....这是我一直在使用的......

string RemoveBetween(string s, char begin, char end)
{
    Regex regex = new Regex(string.Format("\\{0}.*?\\{1}", begin, end));
    return regex.Replace(s, string.Empty);
}

I don't speak C, but here's the java implementation:我不会说 C,但这是 java 的实现:

input.replaceAll("(?<=\\().*?(?=[\"()])(\"([^\"]*)\")?.*(?=\\))", "$2");

This produces the following results:这会产生以下结果:

"foo (bar \"hello world\" foo) bar" --> "foo (hello world) bar"
"foo (bar foo) bar" --> "foo () bar"

It wasn't clear whether you wanted to preserve the quotes - if you did, use $1 instead of $2目前尚不清楚您是否要保留引号 - 如果您这样做,请使用 $1 而不是 $2

Now that you've got the working regex, you should be able to make it work for you in C.现在您已经获得了工作正则表达式,您应该能够在 C 中为您工作。

.NET regexes are even more powerful than the usual and you can surely do what you want. .NET 正则表达式比平常更强大,你肯定可以做你想做的事。 Take a look at this, which looks for balanced parentheses, which is essentially the same problem as yours but with parentheses and not quotes.看看这个,它寻找平衡的括号,这与你的问题基本相同,但有括号而不是引号。

http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx

It's risky to say "No you can't" on this forum, because somebody will go and ruin it by providing a working answer.在这个论坛上说“不,你不能”是有风险的,因为有人会 go 并通过提供有效的答案来破坏它。 :-) :-)

But I will say that this would be really stretching regular expressions, and your problem elegantly lends itself to Automata-based programming .但我会说,这真的会拉伸正则表达式,并且您的问题很适合基于 Automata 的编程

Personally, I'm happier maintaining a 20-line finite state machine then a 10 character regular expression.就个人而言,我更乐意维护一个 20 行有限 state 机器,然后是一个 10 个字符的正则表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM