简体   繁体   English

用Java正则表达式拆分

[英]Splitting by java regular expression

I have a string like: 我有一个像这样的字符串:

Snt:It was the most widespread day of environmental action in the planet's history
====================
-----------
Snt:Five years ago, I was working for just over minimum wage
====================
-----------

and I want to split the string with 我想用

====================
-----------

and ofcourse remove Snt: from the first of sentences. 当然从句子的第一句中删除Snt: what is the best way? 什么是最好的方法?

I used this regular expression, but it didnt work! 我用了这个正则表达式,但是没用!

String[] content1 =content.split("\\n\\====================\\n\\-----------\\n");

Thanks in advance. 提前致谢。

What about 关于什么

Pattern p = Pattern.compile("^Snt:(.*)$", Pattern.MULTILINE);
Matcher m = p.matcher(str);

while (m.find()) {
    String sentence = m.group(1);
}

Rather than hacking around with split and doing extra parsing, this just looks for lines beginning with "Snt," then captures whatever follows. 而不是黑客各地的split ,做额外的解析,这只是看起来与“SNT”,然后捕获任何如下开始的行。

Because of the way the data is structured, I would reverse the concept from a split, to be a matcher instead., This allows you to mathc the Snt nicely as well: 由于数据的结构方式,我将把拆分的概念颠倒过来,成为匹配器。,这也使您可以很好地对Snt进行数学计算:

private static final String VAL = "Snt:It was the most widespread day of environmental action in the planet's history\n"
        + "====================\n"
        + "-----------\n"
        + "Snt:Five years ago, I was working for just over minimum wage\n"
        + "====================\n"
        + "-----------";

public static void main(String[] args) {
    List<String> phrases = new ArrayList<String>();
    Matcher mat = Pattern.compile("Snt:(.+?)\n={20}\n-{11}\\s*").matcher(VAL);
    while (mat.find()) {
        phrases.add(mat.group(1));
    }

    System.out.printf("Value: %s%n", phrases); 
}

I use the regex: "Snt:(.+?)\\n={20}\\n-{11}\\\\s*" 我使用正则表达式: "Snt:(.+?)\\n={20}\\n-{11}\\\\s*"

This assumes that the first word in the file is the Snt: , and then it groups the next phrase, until the delimiter. 假设文件中的第一个单词是Snt:然后将下一个短语分组,直到定界符为止。 It will consume any trailing whitespace, making the expression ready for the next record. 它将占用任何结尾的空格,使表达式为下一条记录做好准备。

The upside of this process is that the match matches a single record, instead of having an expression that matches part of the end of one record, an perhaps the beginning of the next. 此过程的好处是,匹配项匹配单个记录,而不是具有与一个记录的结尾部分(也许是下一个记录的开头)部分匹配的表达式。

Because there is no newline exist at the last, it won't match the last == , -- lines. 由于最后没有换行符,因此它将不匹配最后的==--行。 You need to add end of the line anchor $ at the last as an alternative to \\n in your regex. 您需要在最后添加行锚$的末尾,以替代正则表达式中\\n

String s = "Snt:It was the most widespread day of environmental action in the planet's history\n" +
"====================\n" +
"-----------\n" +
"Snt:Five years ago, I was working for just over minimum wage\n" +
"====================\n" +
"-----------";
String m = s.replaceAll("(?m)^Snt:", "");
String[] tok = m.split("\\n\\====================\\n\\-----------(?:\\n|$)");
System.out.println(Arrays.toString(tok));

Output: 输出:

[It was the most widespread day of environmental action in the planet's history, Five years ago, I was working for just over minimum wage]
Matcher m = Pattern.compile("([^=\\-]+)([=\\-]+[\\t\\n\\s]*)+").matcher(str);   
while (m.find()) {
    String match = m.group(1);
    System.out.println(match);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM