简体   繁体   English

计算Java中字符串的行数-BufferedReader行为

[英]Count number of lines in a string in java - BufferedReader behavior

I am using the function countLines to count the number of lines in a string. 我正在使用countLines函数来计算字符串中的行数。 It uses StringReader and BufferedReader. 它使用StringReader和BufferedReader。 But I get a different result than I expected for the string test in my example. 但是我得到的结果与示例中对字符串测试的预期结果不同。 Can somebody verify this scenario and tell if BufferedReader behaves as expected. 有人可以验证这种情况,并判断BufferedReader的行为是否符合预期。

package test;

import java.io.BufferedReader;
import java.io.StringReader;

public class LineCountTest {

    private static final String test = "This is a\ntest string\n\n\n";
    private static final String test2 = "This is a\ntest string\n\n\n ";

    public static void main(String[] args) {
        System.out.println("Line count: " + countLines(test));
        System.out.println("Line count: " + countLines(test2));
    }

    private static int countLines(String s) {
        try (
                StringReader sr = new StringReader(s);
                BufferedReader br = new BufferedReader(sr)
        ) {
            int count = 0;
            for (String line = br.readLine(); line != null; line = br.readLine()) {
                count++;
            }
            return count;
        } catch (Exception e) {
            return -1;
        }
    }

}

I expected countLines to return 5 in both cases, but it returns 4 for the first string. 我期望在两种情况下countLines都返回5 ,但第一个字符串返回4

Background: I actually need the value of line to fill an array of strings and expected the last element to be the empty string. 背景:实际上,我需要line的值来填充字符串数组,并且期望最后一个元素为空字符串。

Edit: I already know that 编辑:我已经知道

String[] lines = s.split("\n", -1);
int count = lines.length;

will give me the correct/expected number of lines. 给我正确/预期的行数。 I only ask for performance reasons and if somebody can tell if BufferedReader behaves correctly. 我仅出于性能方面的原因以及是否有人可以判断BufferedReader的行为是否正确而询问。

Check this code . 检查此代码

class LineCountTest
{
    private static final String test = "This is a\ntest string\n\n\n";
    private static final String test2 = "This is a\ntest string\n\n\n ";

    public static void main(String[] args) {
        System.out.println("Line count: " + countLines(test));
        System.out.println("Line count: " + countLines(test2));
    }

    private static int countLines(String s) {
        return (s + " ").split("\r?\n").length;
    }
}

This will solve your problem. 这样可以解决您的问题。

This code splits the string by \\r\\n or \\n and return the number of lines. 此代码用\\r\\n\\n分割字符串,并返回行数。

The additional blank space is added so that the last line is counted even if it is empty. 添加了额外的空格,以便即使最后一行为空也要计数。

The BufferedReader is behaving correctly. BufferedReader的行为正确。

The condition line != null is causing the problem. 条件line != null导致了问题。

In the string test , there is nothing after the last \\n , which is read as null by BufferedReader#readLine() and thats why the loop terminates and the output is 4 . 在字符串test ,最后一个\\n之后没有任何内容BufferedReader#readLine()其读取为null ,这就是循环终止且输出为4

In the string test2 , there is a blank space after the last \\n , which allows another iteration and the output is 5 . 在字符串test2 ,最后一个\\n后有一个空格 ,它允许进行另一次迭代,并且输出为5

So you found that a last line is recognized when it ends with a \\n or is non-empty. 因此,您发现最后一行以\\n结束或为非空时可以识别。

For your purposes one might be able to use: 为了您的目的,您可以使用:

String[] lines = "This is a\ntest string\n\n\n".split("\r?\n", 5);

This assures that the array will have 5 elements. 这样可以确保数组具有5个元素。 Regex split is a bit slower though. 虽然正则表达式的拆分要慢一些。

if you add an extra space in your first string. 如果您在第一个字符串中添加了额外的空格。

private static final String test = "This is a\ntest string\n\n\n ";

you will get both same count. 您将获得两个相同的计数。 The main reason is in for loop : 主要原因是在for循环中:

for (String line = br.readLine(); line != null; line = br.readLine()) 
{
        count++;
}

third parameter of for loop "line = br.readLine()" only return a string if after "\\n" is there available any other string . for循环的第三个参数“ line = br.readLine()”仅在“ \\ n”之后还有其他可用字符串时才返回字符串。 in your first string there have no other character but in your second string you add a space and this space now consider as a new string. 在第一个字符串中没有其他字符,但是在第二个字符串中添加了一个空格,该空格现在视为新字符串。 that's why you get 4 and 5 count number. 这就是为什么您得到4和5计数数字的原因。

If you use Java 8 then: 如果您使用Java 8,则:

long lines = stringWithNewlines.chars().filter(x -> x == '\n').count() + 1;

(+1 in the end is to count last line if string is trimmed) (如果字符串被修剪,最后+1是计算最后一行)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM