用Java正则表达式拆分

Question

I have a string like: 我有一个像这样的字符串：

Snt:It was the most widespread day of environmental action in the planet's history
====================
-----------
Snt:Five years ago, I was working for just over minimum wage
====================
-----------

and I want to split the string with 我想用

====================
-----------

and ofcourse remove Snt: from the first of sentences. 当然从句子的第一句中删除Snt: what is the best way? 什么是最好的方法？

I used this regular expression, but it didnt work! 我用了这个正则表达式，但是没用！

String[] content1 =content.split("\\n\\====================\\n\\-----------\\n");

Thanks in advance. 提前致谢。

Answer 1

What about 关于什么

Pattern p = Pattern.compile("^Snt:(.*)$", Pattern.MULTILINE);
Matcher m = p.matcher(str);

while (m.find()) {
    String sentence = m.group(1);
}

Rather than hacking around with split and doing extra parsing, this just looks for lines beginning with "Snt," then captures whatever follows. 而不是黑客各地的split ，做额外的解析，这只是看起来与“SNT”，然后捕获任何如下开始的行。

Answer 2

Because of the way the data is structured, I would reverse the concept from a split, to be a matcher instead., This allows you to mathc the Snt nicely as well: 由于数据的结构方式，我将把拆分的概念颠倒过来，成为匹配器。，这也使您可以很好地对Snt进行数学计算：

private static final String VAL = "Snt:It was the most widespread day of environmental action in the planet's history\n"
        + "====================\n"
        + "-----------\n"
        + "Snt:Five years ago, I was working for just over minimum wage\n"
        + "====================\n"
        + "-----------";

public static void main(String[] args) {
    List<String> phrases = new ArrayList<String>();
    Matcher mat = Pattern.compile("Snt:(.+?)\n={20}\n-{11}\\s*").matcher(VAL);
    while (mat.find()) {
        phrases.add(mat.group(1));
    }

    System.out.printf("Value: %s%n", phrases); 
}

I use the regex: "Snt:(.+?)\\n={20}\\n-{11}\\\\s*" 我使用正则表达式： "Snt:(.+?)\\n={20}\\n-{11}\\\\s*"

This assumes that the first word in the file is the Snt: , and then it groups the next phrase, until the delimiter. 假设文件中的第一个单词是Snt:然后将下一个短语分组，直到定界符为止。 It will consume any trailing whitespace, making the expression ready for the next record. 它将占用任何结尾的空格，使表达式为下一条记录做好准备。

The upside of this process is that the match matches a single record, instead of having an expression that matches part of the end of one record, an perhaps the beginning of the next. 此过程的好处是，匹配项匹配单个记录，而不是具有与一个记录的结尾部分（也许是下一个记录的开头）部分匹配的表达式。

Answer 3

Because there is no newline exist at the last, it won't match the last == , -- lines. 由于最后没有换行符，因此它将不匹配最后的== ， --行。 You need to add end of the line anchor $ at the last as an alternative to \\n in your regex. 您需要在最后添加行锚$的末尾，以替代正则表达式中\\n 。

String s = "Snt:It was the most widespread day of environmental action in the planet's history\n" +
"====================\n" +
"-----------\n" +
"Snt:Five years ago, I was working for just over minimum wage\n" +
"====================\n" +
"-----------";
String m = s.replaceAll("(?m)^Snt:", "");
String[] tok = m.split("\\n\\====================\\n\\-----------(?:\\n|$)");
System.out.println(Arrays.toString(tok));

Output: 输出：

[It was the most widespread day of environmental action in the planet's history, Five years ago, I was working for just over minimum wage]

Answer 4

Matcher m = Pattern.compile("([^=\\-]+)([=\\-]+[\\t\\n\\s]*)+").matcher(str);   
while (m.find()) {
    String match = m.group(1);
    System.out.println(match);
}

用Java正则表达式拆分

问题描述

4 个解决方案

解决方案1
3 2014-10-03 17:05:08

解决方案2
2 2014-10-03 17:02:35

解决方案3
1 2014-10-03 16:55:54

解决方案4
0 2014-10-06 07:22:29

用Java正则表达式拆分

问题描述

4 个解决方案

解决方案1 3 2014-10-03 17:05:08

解决方案2 2 2014-10-03 17:02:35

解决方案3 1 2014-10-03 16:55:54

解决方案4 0 2014-10-06 07:22:29

解决方案1
3 2014-10-03 17:05:08

解决方案2
2 2014-10-03 17:02:35

解决方案3
1 2014-10-03 16:55:54

解决方案4
0 2014-10-06 07:22:29