简体   繁体   English

使用Java删除换行

[英]Removing linefeeds using java

I have a text file with a mix of newlines (CR/LF or \\r\\f or "\\n") and formfeeds (LF or \\f ) in a tab \\t delimited file. 我在制表符\\t分隔文件中混合了换行符(CR / LF或\\r\\f或“ \\ n”)和换页符(LF或\\f )的文本文件。 The newlines appear as the expected, "\\n" but the formfeeds are also used as internal field delimiters. 换行符显示为预期的“ \\ n”,但换页符也用作内部字段定界符。 Example: 例:

COL_1   COL_2   COL_3    COL_4
1       A\fB    C\fD     2    

Using Java I was able to remove the formfeeds only after I set line.separator to \\r - for CR/LF or \\r\\f and then reading in the file using the FileReader.read() checking for '\\n' : 使用Java,只有在将line.separator设置为\\r (对于CR / LF或\\r\\f ,然后使用FileReader.read()检查'\\n'来读取文件后,才能删除换页符:

private void fixMe() throws Exception{

  FileReader in  = new FileReader("C:\\somefile.txt"); 
  FileReader out = new FileReader("C:\\someotherfile.txt"); 

  Syetem.setProperty("line.separator","\r");

  try {
    int c;
    while (( c = in.read()) != -1 ) {
        if ( c != '\n' ) {
             out.write(c);
        }
    }
  }
  ...

It appears that in.read has a default setting to read "\\n" as two characters. 看来in.read具有一个默认设置,以两个字符的形式读取“ \\ n”。 I can remove \\f but now I'll have to write another method to change \\r to a "\\n" and reset line.separator as part of the method. 我可以删除\\f但是现在我必须编写另一种方法来将\\r更改为“ \\ n”并重置line.separator作为该方法的一部分。 Is there a better way to do this? 有一个更好的方法吗? I want to use Scanner, but the solution points at resetting the line.separator setting again which I want to avoid. 我想使用扫描仪,但是解决方法是再次重置line.separator设置,这是我要避免的。

Better way to read all file content, then remove "\\n and \\r\\n and \\f", after save where you want. 更好的方式来读取所有文件内容,然后在需要的位置保存后删除“ \\ n和\\ r \\ n和\\ f”。

See example: 参见示例:

String content = new String(Files.readAllBytes(Paths.get("path-to-file")));
String processedContent = content.replaceAll("\\n|\\r\\n|\\f", "");

According to your question it seems like you want to skip Line Feed '\\f' in the file without skipping if it is CRLF \\r\\f, so keeping track of last character read might solve your issue. 根据您的问题,似乎您想跳过文件中的换行符'\\ f'而不是CRLF \\ r \\ f,因此跟踪最后读取的字符可能会解决您的问题。

private void fixMe() throws Exception{

  FileReader in  = new FileReader("C:\\somefile.txt"); 
  FileReader out = new FileReader("C:\\someotherfile.txt"); 

//Character 10 'LF' or '\f' and 13 'CR' or '\r'
  try {
    int c;
    int prevCharRead = 0;
    while ((c = in.read()) != -1 ) {
        if(c==10 && prevCharRead!=13){
        //it's a line feed LF '\f' without the occurrence of CR '\r' before it, skip it or implement whatever logic you want.  
        }else  
           out.write(c);

        prevCharRead = c;
    }
  }
  ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM