简体   繁体   English

在scala中解析字符串中的引号

[英]Parsing quotes in string in scala

I am trying to parse the following string 我正在尝试解析以下字符串

val s1 = """ "foo","bar", "foo,bar" """

And out put of this parsing I am hoping is... 而我希望这种解析是...

 List[String] ["foo","bar","foo,bar"] length 3

I am able to parse the following 我可以解析以下内容

val s2 = """ "foo","bar", 'foo,bar' """

By using the following pattern 通过使用以下模式

val pattern = "(('[^']*')|([^,]+))".r

 pattern.findAllMatchIn(s2).map(_.toString).toList
 gives  ["foo","bar", 'foo,bar'] :length 3

EDIT Currently I am able to parse: "foo,bar,foo bar" => [foo,bar,foo bar"] "foo,bar, 'foo bar' " => [foo, bar , 'foo bar'] //len 3 编辑目前我能够解析:“ foo,bar,foo bar” => [foo,bar,foo bar“]” foo,bar,'foo bar'“ => [foo,bar,'foo bar'] / / len 3

I want to parse these lines as well.. 我也想解析这些行。

But I am not able to figure out the pattern for s2.. Note that I need to parse both s1 and s2 successfully 但是我无法弄清楚s2的模式。请注意,我需要同时解析s1和s2

Currently I am able to parse: 目前,我能够解析:

"foo,bar,foo bar" => [foo,bar,foo bar"]
    "foo,bar, 'foo bar' " => [foo, bar , 'foo bar'] //len 3

I want to parse these lines as well.. along with the following line: 我也想解析这些行..以及以下行:

 """ foo, bar, "foo,bar" """ // gives [foo,bar,"foo,bar"] len 3

The following works for your s1 and s2 examples: 以下适用于您的s1s2示例:

(["']).*?\1

["'] matches a double or single quote (which is captured as a group). We then match anything followed by a closing quote that matches the opening quote (using the capture group \\1 ). We use a non-greedy match .*? so that we don't consume the closing quote. ["']匹配双引号或单引号(被捕获为一个组)。然后,我们匹配任何内容,然后匹配与开始引号匹配的结束引号(使用捕获组\\1 )。我们使用非贪婪匹配.*?这样我们就不会使用结束语。

Note that you'll need to use triple quoting, since the pattern has a quote in it: 请注意,由于该模式中带有引号,因此您需要使用三重引号:

val pattern =  """(["']).*?\1""".r

Update to handle further cases added to question: 更新以处理添加到问题的其他案例:

To also handle your comma-separated examples, you need to match combinations of word characters \\w or whitespace \\s , terminated by either a comma or the end of the line, but excluding the terminating character using a lookahead (?=(,|$)) 要同时处理逗号分隔的示例,您需要匹配单词字符\\w或空格\\s组合,这些字符以逗号或行尾结尾,但使用前瞻字符(?=(,|$))

(["']).*?\1|\w(\w|\s)*(?=(,|$))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM