简体   繁体   English

Java Regex捕获“”或“''之间的文本

[英]Java Regex Capture Text Between “”" or '''

I have a document I am trying to parse with Java Regex and in it appears text in quotes either """ or ''' so you have: 我有一个要用Java Regex解析的文档,并且该文档的引号中出现“””或“'',因此您具有:

""" Bla, you're not very nice! """ or: “”“ Bla,你不是很好!”“”或:

''' Bla, this 1 isn't a great example ''' '''Bla,这不是一个很好的例子'''

I have been trying along the lines of ["""|''']([\\p{Alnum}|\\p{Blank}]+)[\\"""|'''] 我一直在尝试["""|''']([\\p{Alnum}|\\p{Blank}]+)[\\"""|''']

Assumptions: The text will start and end with either """ or ''' The text could include numbers, letter, blanks and punctuation The body of the text will not include the sequence of three " or three ' 假设:文本将以“”“或'''开头和结尾。文本可能包括数字,字母,空格和标点符号。文本正文将不包含三个“或三个”的序列

Try this pattern: ("""|''').*?\\1 尝试以下模式:( ("""|''').*?\\1

Given: 鉴于:

"""Hello, World!""" some unquoted text """ lorem ipsum ''" dolor """ some more unquoted text '''single quotes'''
''' Bla, this 1 isn't a great example '''

It will match: 它将匹配:

  1. """Hello, World!"""
  2. """ lorem ipsum ''" dolor """
  3. '''single quotes'''
  4. ''' Bla, this 1 isn't a great example '''

You can also probably be more specific than .*? 您也可能比.*?更具体.*? but I wasn't sure what characters you meant by "punctuation". 但是我不确定“标点符号”是什么字符。

Something like so worked for me: 像这样对我有用的东西:

        Pattern p = Pattern.compile("(\"{3}(.*?)\"{3})|('{3}(.*?)'{3})");
        String s1 = "\"\"\" Bla, you're not very nice! \"\"\"";
        String s2 = "''' Bla, this 1 isn't a great example '''";

        Matcher m1 = p.matcher(s1);
        Matcher m2 = p.matcher(s2);

        if (m1.matches())
        {
            System.out.println(m1.group(2));
        }


        if (m2.matches())
        {               
            System.out.println(m2.group(4));
        }

It would, however, make it simpler to just use 2 regular expressions. 但是,仅使用2个正则表达式将使其更简单。 The above code yielded the following: 上面的代码产生了以下内容:

Bla, you're not very nice! 布拉,你不是很好!

Bla, this 1 isn't a great example Bla,这个1不是一个好例子

One of the issues with your regular expression is that any text within the square brackets is OR'D , meaning that the Pipe character is useless (as an OR operator). 正则表达式的问题之一是,方括号内的任何文本均为OR'D ,这意味着Pipe字符无用(作为OR运算符)。 You will need to replace your square brackets with round ones. 您将需要用圆括号替换方括号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM