简体   繁体   English

如何识别文本文件的行分隔符? 并且可以在写入该文本文件时将其复制吗?

[英]how to identify a text file's line separator? and can you copy it when writing into that text file?

Working in Java, a homework problem is asking me to 在Java中工作时,有一个作业问题要求我

read a file, manipulate the strings and lines in it, (not the problem) write to the file, but keep the line separator that was used in the original file and use it when writing back to the file (big problem), Junit tests will use multiple different line separators for their file inputs. 读取文件,处理其中的字符串和行,(不是问题)写入文件,但保留原始文件中使用的行分隔符,并在写回文件时使用它(大问题),Junit测试将使用多个不同的行分隔符作为其文件输入。

My question is what methods can I use to identify what line separator the text file is using? 我的问题是我可以使用哪些方法来识别文本文件使用的行分隔符?

The file texts that are being inputted and read by my project can have /r, /n, /r/n, or a + System.lineSeparator(). 我的项目正在输入和读取的文件文本可以具有/ r,/ n,/ r / n或+ System.lineSeparator()。 When I write to the text file, i also need to match the original line separator so it can be OS friendly. 当我写入文本文件时,我还需要匹配原始的行分隔符,以便它对OS友好。

  String data = "";
    try { data = new String(Files.readAllBytes(Paths.get(path)));
    } catch (IOException e) { e.printStackTrace(); }
    //System.out.println("Text file as String in Java");
    //System.out.println(data);
    String linesep= "";
    if (data.contains("\n") == true){
        linesep="\n";
        //System.out.println("n detected");
    }else if((data.contains("\r") == true)){
        linesep="\r";
        //System.out.println("r detected");
         }
    else {
        linesep= System.lineSeparator();
        //System.out.println("line separator detected");
    }

The requirements as you showed them to us say: 您向我们展示的要求说:

Junit tests will use multiple different line separators for their file inputs. Junit测试将使用多个不同的行分隔符作为其文件输入。

They do NOT say that each test will use a consistent line separator throughout the file. 他们没有说每个测试将在整个文件中使用一致的行分隔符。 They also do NOT say that last line will always end with a separator. 他们也不说最后一行总是以分隔符结尾。

If you write your code to try to find out which separator is used, it will break when the file has mixed separators. 如果您编写代码以尝试找出使用了哪个分隔符,则在文件中混合使用分隔符时,它将中断。

So what you need to do is to preserve the separator at the end of each line including empty lines. 因此,您需要做的是在每行的末尾保留分隔符,包括空行。 And you need to deal with the final line which may have no separator. 并且您需要处理可能没有分隔符的最后一行。

Hint: the line separator characters are just characters, so you could include them in the line strings ... if you decide to split the input into lines at all. 提示:行分隔符只是字符,因此,如果您决定将输入完全拆分为行,则可以将它们包括在行字符串中。

... and can you copy it when writing into that text file? ...并且在写入该文本文件时可以将其复制吗?

See above! 往上看!

As far as I know, there is no way to tell without question what the line separator for a particular file is, without some additional information that is not in the file itself. 据我所知,如果没有文件本身中没有的其他信息,就无法毫无疑问地告诉特定文件的行分隔符是什么。

As others have pointed out, the carriage-return and line-feed characters are just characters - there's nothing special about them. 正如其他人指出的那样,回车符和换行符只是字符-它们没有什么特殊之处。 It just convention that these are separators, and the convention is different on Windows than it is on Linux and Mac OS. 只是约定这些是分隔符,并且Windows上的约定与Linux和Mac OS上的约定不同。

However, especially since it sounds like your program will be writing the files, you can try to make some assumptions: 但是,尤其是由于听起来您的程序将要写入文件,您可以尝试做一些假设:

  • Each file will use one and only one of the three "standard" line separators: \\r, \\n, or \\r\\n 每个文件将只使用三个“标准”行分隔符之一:\\ r,\\ n或\\ r \\ n

  • Each file will not contain any of the other two line separators 每个文件将不包含任何其他两个行分隔符

If you can safely make those assumptions, then you can simply read the file (as a binary file, not as a text file) and inspect the characters, looking for one of the line endings. 如果可以安全地做出这些假设,则可以简单地读取文件(作为二进制文件,而不是文本文件)并检查字符,寻找行尾之一。

To append new lines to the file with a matching separator, you can set the system line separator with, eg 要将带有匹配分隔符的新行添加到文件中,可以使用以下方式设置系统行分隔符:

System.setProperty("line.separator", "\r\n");

using the line separator that you found in the file. 使用在文件中找到的行分隔符。

That should cause the new separator to be used when you write to the file normally. 正常写入文件时,应该使用新的分隔符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM