简体   繁体   English

正则表达式方法来提取用双引号引起来的字符串

[英]regex approach for extracting strings surrounded with double quotes

I have a search string that is getting passed 我有一个通过的搜索字符串

Eg: "a+b",a, b, "C","d+e",ab,d 例如:“ a + b”,a,b,“ C”,“ d + e”,ab,d

I want to filter out all sub strings surrounded by double quotes(""). 我想过滤掉所有用双引号(“”)包围的子字符串。 In above sample Output should contain: 在上面的示例中,输出应包含:

"a+b","C","d+e" “ a + b”,“ C”,“ d + e”

Is there a way to do this without looping? 有没有办法做到这一点而不循环?

Also I then need to extract a string without above values to do further processing 然后我还需要提取一个没有上述值的字符串来进行进一步处理

Eg: a,b,ab,d 例如:a,b,ab,d

Any suggestions on how to do this with minimal performance impact? 关于如何以最小的性能影响做到这一点的任何建议?

Thank you in advance for all your comments and suggestions 预先感谢您的所有意见和建议

Since you didn't say anything about how exactly you want your output (do you need to keep the commas and extra whitespace? Is it comma delimited to begin with? Let's assume that it is NOT comma delimited and you are just trying to remove the occurences of the "xyz": 由于您没有说出要输出的内容到底是什么(您需要保留逗号和多余的空格吗?是以逗号开头的吗?让我们假设它不是逗号分隔的,而您只是尝试删除“ xyz”的出现:

    string strRegex = @"""([^""])+""";
    string strTargetString = @" ""a+b"",a, b, ""C"",""d+e"",a-b,d";
    string strOutput = Regex.Replace(strTargetString, strRegex, x => "");

Will remove all of the items (leaving the extra commas and whitespace). 将删除所有项目(保留多余的逗号和空格)。

If you are trying to do something where you need each individual match then you might want to try: 如果您尝试做一些需要每个单独比赛的事情,那么您可能想要尝试:

var y = (from Match m in Regex.Matches(strTargetString, strRegex) select m.Value).ToList<string>();
y.ForEach(s => Console.WriteLine(s));

To get the list of items without the surrounding quotes, you could either reverse the regex pattern OR use the replace method in the first code sample and then split on the commas, trimming white space (again, assuming you are splitting on commas which it sounds like you are) 要获得没有周围引号的项目列表,您可以反转正则表达式模式,或者在第一个代码示例中使用replace方法,然后在逗号上分割,以修剪空白(再次,假设您在听起来的逗号上分割)就像你一样)

First, add a comma to the end of your output: 首先,在输出末尾添加一个逗号:

"a+b",a, b, "C","d+e",a-b,d,

Then, use this regular expression: 然后,使用以下正则表达式:

((?<quoted>\".+?\")|(?<unquoted>.+?)),\s*

Now you have 2 problems. 现在您有两个问题。 Kidding! 开玩笑!

You'll have to find a way of extracting the matches without using a loop, but at least they are separated into quoted and unquoted strings by using the group. 您将必须找到一种无需使用循环即可提取匹配项的方法,但是至少可以使用该组将它们分成带引号和不带引号的字符串。 You could use a lamdba expression to pull the data out and join it, one each for quoted and unquoted, but it's just doing a loop behind the scenes, and may add more overhead than a simple for loop. 您可以使用lamdba表达式将数据拉出并加入数据,每个数据都包含引用和未引用的内容,但这只是在后台进行循环,比简单的for循环可能会增加更多开销。 It sounds like you're trying to eek out performance here, so time and test each method to see what gives the best results. 听起来您在这里想尝试一下性能,所以花点时间测试每种方法,看看能带来最好的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM