繁体   English   中英

如何从Java文件中读取段落

[英]How to read a paragraph from a file in java

给出了一个包含许多段落的文件。 我期望的输出是我一次阅读一个段落并对其执行操作。

final String PARAGRAPH_SPLIT_REGEX = "(?m)(?=^\\s{4})";

        String currentLine;

        final BufferedReader bf = new BufferedReader(new FileReader("filename"));


            currentLine = bf.readLine();

            final StringBuilder stringBuilder = new StringBuilder();
            while(currentLine !=null) {

                stringBuilder.append(currentLine);
                stringBuilder.append(System.lineSeparator());
                currentLine = bf.readLine();
            }

            String[] paragraph= new String[stringBuilder.length()];

            if(stringBuilder!=null) {

                final String value = stringBuilder.toString();
                paragraph = value.split(PARAGRAPH_SPLIT_REGEX);
            }

            for (final String s : paragraph) {

                System.out.println(s);
            }

文件(每个段落前都有2个字符的空格,并且段落之间没有空行):

故事

她的同伴乐器设定了明显的静止不动的性爱。 物业人员为何要求最小的优雅的一天。 询问正义国家老地方坐任何十个年龄。 从整体上看风险正义显然是他的能力。 是个失落多年的失落女孩。
“流连忘了他。”在家庭确定性上,难以容忍的小困难。下一个整洁的人,每个人都不喜欢。她假装没有享受的行为。在他拉扯别人时。
通过它的十个领导。 优先选择任何惊讶的无保留夫人。 富裕的人相信信念中的米德尔顿并不常见。 假设这样解决早餐是完美的。 是从先生那里抽来的。 哦,二十点钟就这样指引我。
出发失误安排的狂喜确实相信他都得到了支持。 家庭几个月持续了简单的自然定律。 “尝试喜悦的照片激发了十个人举止说话的方式。怀疑使他无视了达成的共识。”

但是,我没有达到期望的输出。 段落变量仅包含两个值

  1. 文件标题
  2. 文件的其余内容。

我想,我要在这里使用的正则表达式无法正常工作。 我从这里收集的正则表达式。 使用正则表达式JAVA将文本拆分为段落

我正在使用java8。

您可以将Scanner与定界符一起使用,以遍历文本。 例如:

Scanner scanner = new Scanner(text).useDelimiter("\n  ");
while (scanner.hasNext()) {
    String paragraph = scanner.next();
    System.out.println("# " + paragraph);
}

输出为:

#                       Story

# Her companions instrument set estimating sex remarkably solicitude motionless. Property men the why smallest graceful day insisted required. Inquiry justice country old placing sitting any ten age. Looking venture justice in evident in totally he do ability. Be is lose girl long of up give.
# "Trifling wondered unpacked ye at he. In household certainty an on tolerably smallness difficult. Many no each like up be is next neat. Put not enjoyment behaviour her supposing. At he pulled object others."
# Passage its ten led hearted removal cordial. Preference any astonished unreserved mrs. Prosperous understood middletons in conviction an uncommonly do. Supposing so be resolving breakfast am or perfectly. Is drew am hill from mr. Valley by oh twenty direct me so.
# Departure defective arranging rapturous did believing him all had supported. Family months lasted simple set nature vulgar him.   "Picture for attempt joy excited ten carried manners talking how. Suspicion neglected he resolving agreement perceived at an."

根据Jason的评论,我尝试了他的方法。我认为我取得了预期的结果,但是,我对这种方法不满意,时间和空间的复杂性增加了,以后可能会即兴使用。

currentLine = bf.readLine();

            List<List<String>> paragraphs =  new LinkedList<>();

            int counter = 0;
            while(currentLine !=null) {

                if(paragraphs.isEmpty()) {

                    List<String> paragraph = new LinkedList<>();

                    paragraph.add(currentLine);
                    paragraph.add(System.lineSeparator());

                    paragraphs.add(paragraph);

                    currentLine = bf.readLine();

                    continue;
                }

                if(currentLine.startsWith(" ")) {
                    List<String> paragraph = new LinkedList<>();

                    paragraph.add(currentLine);

                    counter = counter + 1;

                    paragraphs.add(paragraph);

                }else {
                    List<String> continuedParagraph = paragraphs.get(counter);

                    continuedParagraph.add(currentLine);
                }

                currentLine = bf.readLine();
            }

            for (final List<String> story : paragraphs) {

                for(final String s : story) {
                    System.out.println(s);
                }
            }

您可以在全局范围内找到每个缩进的段落,然后添加到列表中。

"(?m)^[^\\\\S\\\\r\\\\n]{2,}\\\\S.*(?:\\\\r?\\\\n|$)(?:^\\\\S.*(?:\\\\r?\\\\n|$))*"

Expanation

 (?m)                     # Multi-line mode ( ^ = begin of line )

 ^ [^\S\r\n]{2,}          # Begin of Paragraph, 2 or more horizontal wsp at BOL
 \S .*                    # Rest of line, must be non-wsp as first letter.
 (?: \r? \n | $ )

 (?:                      # Optional, many more lines of this paragraph
      ^ \S .* 
      (?: \r? \n | $ )
 )*

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM