简体   繁体   English

解析为数据字符串,但省略引号

[英]Parsing as string of data but leaving out quotes

I need to use RegEx to run through a string of text but only return that parts that I need. 我需要使用RegEx来运行一串文本,但只返回我需要的那部分。 Let's say for example the string is as follows: 例如,假设字符串如下:

1234,Weapon Types,100,Handgun,"This is the text, "and", that is all.""" 1234,武器类型,100,手枪,“这是文本,“和”,仅此而已。”

\\d*,Weapon Types,(\\d*),(\\w+), gets me most of the way, however it is the last part that I am having an issue with. \\d*,Weapon Types,(\\d*),(\\w+),可以帮助我解决大部分问题,但这是我遇到问题的最后一部分。 Is there a way for me to capture the rest of the string ie 有没有办法让我捕获字符串的其余部分,即

"This is the text, "and", that is all.""" “这是文本,“和”,仅此而已。”””

without picking up the quotes? 没有拿起报价? I've tried negating them, however it just stops the string at the quote. 我试着否定它们,但是它只是将字符串停在引号处。

Please keep in mind that the text for this string is unknown so doing literal matches will not work. 请记住,该字符串的文本未知,因此无法进行文字匹配。

You've given us something very difficult to solve. 您给了我们一些很难解决的东西。 It's okay that you have nested commas inside your string. 可以在字符串中嵌套逗号。 Once we come across a double-quote, we can ignore everything until the end quote. 一旦遇到双引号,我们可以忽略所有内容,直到最后一个引号。 This would gooble up commas. 这会使逗号变糟。

But how will your parser know that the next double-quote isn't ending the string. 但是您的解析器将如何知道下一个双引号没有结束字符串。 How does it know that it a nested double-quote? 它怎么知道它是嵌套的双引号?

If I could slightly modify your input string to make it clear what is a nested quote, then parsing is easy... 如果我可以稍微修改您的输入字符串以使其清楚是什么嵌套引用,那么解析就很容易...

        var txt = "1234,Weapon Types,100,Handgun,\"This is the text, "and", that is all.\",other stuff";
        var m = Regex.Match(txt, @"^\d*,Weapon Types,(\d*),(\w+),""([^""]+)""");
        MessageBox.Show(m.Groups[3].Value);

But if your input string must have nested quotes like that, then we must come up with some other rule for detecting what is the real end of the string. 但是,如果您的输入字符串必须具有这样的嵌套引号,则我们必须提出其他一些规则来检测字符串的真正结尾。 How about this? 这个怎么样?

        var txt = "1234,Weapon Types,100,Handgun,\"This is the text, \"and\", that is all.\",other stuff";
        var m = Regex.Match(txt, @"^\d*,Weapon Types,(\d*),(\w+),""(.+)"",");
        MessageBox.Show(m.Groups[3].Value);

The result is... 结果是...

This is the text, "and", that is all. 这就是文本“和”,仅此而已。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM