简体   繁体   English

如何找出用于分割行的行分隔符 BufferedReader#readLine()?

[英]How to find out which line separator BufferedReader#readLine() used to split the line?

I am reading a file via the BufferedReader我正在通过 BufferedReader 读取文件

String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
   String s = br.readLine();
   if (s == null) break;
   ...
}

I need to know if the lines are separated by '\n' or '\r\n' is there way I can find out?我需要知道这些行是用 '\n' 还是 '\r\n' 分隔的,有什么办法可以找到吗?

I don't want to open the FileInputStream so to scan it initially.我不想打开 FileInputStream 所以最初扫描它。 Ideally I would like to ask the BufferedReader since it must know.理想情况下,我想问 BufferedReader,因为它必须知道。

I am happy to override the BufferedReader to hack it but I really don't want to open the filestream twice.我很高兴覆盖 BufferedReader 来破解它,但我真的不想打开文件流两次。

Thanks,谢谢,

Note: the current line separator (returned by System.getProperty("line.separator") ) can not be used as the file could have been written by another app on another operating system.注意:不能使用当前行分隔符(由 System.getProperty("line.separator") 返回),因为该文件可能已由另一个操作系统上的另一个应用程序写入。

To be in phase with the BufferedReader class, you may use the following method that handles \n, \r, \n\r and \r\n end line separators:要与 BufferedReader class 保持同步,您可以使用以下方法处理 \n、\r、\n\r 和 \r\n 结束行分隔符:

public static String retrieveLineSeparator(File file) throws IOException {
    char current;
    String lineSeparator = "";
    FileInputStream fis = new FileInputStream(file);
    try {
        while (fis.available() > 0) {
            current = (char) fis.read();
            if ((current == '\n') || (current == '\r')) {
                lineSeparator += current;
                if (fis.available() > 0) {
                    char next = (char) fis.read();
                    if ((next != current)
                            && ((next == '\r') || (next == '\n'))) {
                        lineSeparator += next;
                    }
                }
                return lineSeparator;
            }
        }
    } finally {
        if (fis!=null) {
            fis.close();
        }
    }
    return null;
}

After reading the java docs (I confess to being a pythonista), it seems that there isn't a clean way to determine the line-end encoding used in a specific file.在阅读了java 文档(我承认自己是 pythonista)之后,似乎没有一种干净的方法来确定特定文件中使用的行尾编码。

The best thing I can recommended is that you use BufferedReader.read() and iterate over every character in the file.我可以推荐的最好的事情是您使用BufferedReader.read()并遍历文件中的每个字符。 Something like this:像这样的东西:

String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
   String l = "";
   Char c = " ";
   while (true){
        c = br.read();
        if not c == "\n"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
        }
        if not c == "\r"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
            Char ctwo = ' '
            ctwo = br.read();
            if ctwo == "\n"{
                // do extra stuff since you know that you've got a \r\n
            }
        }
        else{
            l = l + c;
        }
   if (l == null) break;
   ...
   l = "";
}

BufferedReader.readLine() does not provide any means of determining what the line break was. BufferedReader.readLine()不提供任何方法来确定换行符是什么。 If you need to know, you'll need to read characters in yourself and find line breaks yourself.如果你需要知道,你需要自己阅读字符并自己找到换行符。

You may be interested in the internal LineBuffer class from Guava (as well as the public LineReader class it's used in).您可能对Guava的内部LineBuffer class(以及它所使用的公共LineReader class)感兴趣。 LineBuffer provides a callback method void handleLine(String line, String end) where end is the line break characters. LineBuffer提供了一个回调方法void handleLine(String line, String end)其中end是换行符。 You could probably base something to do what you want on that.您可能可以基于某些东西来做您想做的事情。 An API might look something like public Line readLine() where Line is an object that contains both the line text and the line end. API 可能类似于public Line readLine() ,其中Line是包含行文本和行尾的 object。

The answer would be You can't find out what was the line ending.答案是你无法找出行尾是什么。

I am looking for what can cause line endings in the same funcion.我正在寻找什么会导致同一功能中的行结束。 After looking at the BufferedReader source code, I can saz that BufferedReader.readLine ends line on '\r' or '\n' and skips leftower '\r' or '\n'.查看 BufferedReader 源代码后,我可以发现 BufferedReader.readLine 在 '\r' 或 '\n' 上结束行并跳过 leftower '\r' 或 '\n'。 Hardcoded, does not care about settings.硬编码,不关心设置。

BufferedReader does not accept FileInputStreams BufferedReader不接受FileInputStreams

No, you cannot find out the line terminator character that was used in the file being read by BufferedReader.不,您无法找出 BufferedReader 正在读取的文件中使用的行终止符。 That information is lost while reading the file.该信息在读取文件时丢失。

Unfornunately all answers below are incorrect.不幸的是,以下所有答案都不正确。

Edit: And yes you can always extend BufferedReader to include the additional functionality you desire.编辑:是的,您始终可以扩展 BufferedReader 以包含您想要的附加功能。

If you happen to be reading this file into a Swing text component then you can just use the JTextComponent.read(...) method to load the file into the Document.如果您碰巧将此文件读入 Swing 文本组件,那么您只需使用 JTextComponent.read(...) 方法将文件加载到 Document 中。 Then you can use:然后你可以使用:

textComponent.getDocument().getProperty( DefaultEditorKit.EndOfLineStringProperty );

to get actual EOL string that was used in the file.获取文件中使用的实际 EOL 字符串。

Maybe you could use Scanner instead.也许您可以改用Scanner

You can pass regular expressions to Scanner#useDelimiter() to set custom delimiter.您可以将正则表达式传递给Scanner#useDelimiter()以设置自定义分隔符。

String regex="(\r)?\n";
String filename=....;
Scanner scan = new Scanner(new FileInputStream(filename));
scan.useDelimiter(Pattern.compile(regex));
while (scan.hasNext()) {
    String str= scan.next();
    // todo
}

You could use this code below to convert BufferedReader to Scanner您可以使用下面的代码将BufferedReader转换为Scanner

 new Scanner(bufferedReader);

Not sure if useful, but sometimes I need to find out the line delimiter after I've read the file already far-down the road.不确定是否有用,但有时我需要在阅读完文件后找出行分隔符。

In this case I use this code:在这种情况下,我使用以下代码:

/**
* <h1> Identify which line delimiter is used in a string </h1>
*
* This is useful when processing files that were created on different operating systems.
*
* @param str - the string with the mystery line delimiter.
* @return  the line delimiter for windows, {@code \r\n}, <br>
*           unix/linux {@code \n} or legacy mac {@code \r} <br>
*           if none can be identified, it falls back to unix {@code \n}
*/
public static String identifyLineDelimiter(String str) {
    if (str.matches("(?s).*(\\r\\n).*")) {     //Windows //$NON-NLS-1$
        return "\r\n"; //$NON-NLS-1$
    } else if (str.matches("(?s).*(\\n).*")) { //Unix/Linux //$NON-NLS-1$
        return "\n"; //$NON-NLS-1$
    } else if (str.matches("(?s).*(\\r).*")) { //Legacy mac os 9. Newer OS X use \n //$NON-NLS-1$
        return "\r"; //$NON-NLS-1$
    } else {
        return "\n";  //fallback onto '\n' if nothing matches. //$NON-NLS-1$
    }
}

If you are using groovy, you can simply do:如果您使用的是 groovy,您可以简单地执行以下操作:

def lineSeparator = new File('path/to/file').text.contains('\r\n') ? '\r\n' : '\n'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM