简体   繁体   English

正则表达式以匹配字符串中单引号或双引号之间的单词

[英]Regex to match words between single or double quotes in a string

I'm looking for the correct regex to provide me the following results: 我正在寻找正确的正则表达式来为我提供以下结果:

  • it needs to group words surrounded by single/double quote 它需要将单引号/双引号引起来的单词分组
  • it needs to keep printing the single quote when there's no other single quote in the string 当字符串中没有其他单引号时,它需要继续打印单引号
  • when not surrounded by single/double quotes - split on space 当不被单引号/双引号引起来时-在空间上分割

I currently have: 我目前有:

Pattern pattern = Pattern.compile("[^\\s\"']+|\"([^\"]*)\"|'([^']*)'");

... but the following examples are not completely working. ...但是以下示例无法完全正常工作。 Who can help me with this one? 谁能帮我这个忙?

Examples: 例子:

  • foo bar 富吧
    • group1: foo 群组1:foo
    • group2: bar 第2组:酒吧
    • description: split on space 描述:在空间上分裂
  • "foo bar" “ foo bar”
    • group1: foo bar group1:foo bar
    • description: surrounded by double quotes so group foo and bar, but don't print double quotes 描述:用双引号引起来,因此将foo和bar分组,但不要打印双引号
  • 'foo bar' 'foo bar'
    • group1: foo bar group1:foo bar
    • description: same as above, but with single quotes 说明:与上述相同,但带有单引号
  • 'foo bar 'foo酒吧
    • group1: 'foo 群组1:'foo
    • group2: bar 第2组:酒吧
    • description: split on space and keep single quote 描述:按空格分割并保留单引号
  • "'foo bar" “'foo酒吧”
    • group1: 'foo bar group1:'foo bar
    • description: surrounded by double quotes so group 'foo and bar and keep single quote 描述:用双引号引起来,因此将'foo和bar分组并保持单引号
  • foo bar' foo bar'
    • group1: foo 群组1:foo
    • group2: bar' 第2组:酒吧
  • foo bar" foo bar”
    • group1: foo 群组1:foo
    • group2: bar" 第2组:酒吧”
  • "foo bar" "stack overflow" “ foo bar”“堆栈溢出”
    • group1: foo bar group1:foo bar
    • group2: stack overflow 组2:堆栈溢出
  • "foo' bar" "stack overflow" how do you do “ foo'bar”“堆栈溢出”你怎么办
    • group1: foo' bar 群组1:foo'bar
    • group2: stack overflow 组2:堆栈溢出
    • group3: how 第3组:如何
    • group4: do 组4:做
    • group5: you 第5组:您
    • group6: do 组6:做

I'm not sure if you can do this in one Matcher.match call, but you can do it with a loop. 我不确定是否可以在一个Matcher.match调用中执行此Matcher.match ,但是可以循环执行。
This code piece solves all the cases you mention above by using Matcher.find() repeatedly: 这段代码通过重复使用Matcher.find()解决了上面提到的所有情况:

Pattern pattern = Pattern.compile("\"([^\"]+)\"|'([^']+)'|\\S+");
List<String> testStrings = Arrays.asList("foo bar", "\"foo bar\"","'foo bar'", "'foo bar", "\"'foo bar\"", "foo bar'", "foo bar\"", "\"foo bar\" \"stack overflow\"", "\"foo' bar\" \"stack overflow\" how do you do");
for (String testString : testStrings) {
    int count = 1;
    Matcher matcher = pattern.matcher(testString);
    System.out.format("* %s%n", testString);
    while (matcher.find()) {
        System.out.format("\t* group%d: %s%n", count++, matcher.group(1) == null ? matcher.group(2) == null ? matcher.group() : matcher.group(2) : matcher.group(1));
    }
}

This prints: 打印:

* foo bar
    * group1: foo
    * group2: bar
* "foo bar"
    * group1: foo bar
* 'foo bar'
    * group1: foo bar
* 'foo bar
    * group1: 'foo
    * group2: bar
* "'foo bar"
    * group1: 'foo bar
* foo bar'
    * group1: foo
    * group2: bar'
* foo bar"
    * group1: foo
    * group2: bar"
* "foo bar" "stack overflow"
    * group1: foo bar
    * group2: stack overflow
* "foo' bar" "stack overflow" how do you do
    * group1: foo' bar
    * group2: stack overflow
    * group3: how
    * group4: do
    * group5: you
    * group6: do

Anytime you have pairings (let it be quotes, or braces) you leave the realm of regex and go into the realm of grammar, which need parsers. 只要您有配对(不管是引号还是花括号),您就离开正则表达式领域,进入需要解析器的语法领域。

I'll leave you with the ultimate answer to this question 我会给你这个问题最终答案

UPDATE: 更新:

A little more explanation. 多一点解释。

A grammar is usually expressed as: 语法通常表示为:

construct -> [set of constructs or terminals]

For example, for quotes 例如,引号

doblequotedstring := " simplequotedstring "
simplequotedstring := string ' string
                      | string '
                      | ' string
                      | '

This is a simple example; 这是一个简单的例子。 there will be proper examples of grammars for quoting in the internet. 互联网上会有引用语法的适当例子。

I have used aflex and ajacc for this (for Ada; in Java exist jflex and jjacc). 我为此使用了aflex和ajacc(对于Ada;在Java中存在jflex和jjacc)。 You pass the list of identifiers to aflex, generate an output, pass that output and the grammar to ajacc and you get an Ada parser. 您将标识符列表传递给aflex,生成一个输出,将该输出和语法传递给ajacc,您将获得一个Ada解析器。 Since it has been a lot of time since I used them, I do not know if there are more streamlined solutions but in the basic it will need the same input. 自从我使用它们以来已经花费了很多时间,所以我不知道是否有更多简化的解决方案,但是从根本上讲,它将需要相同的输入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM