简体   繁体   English

使用数字字符串时,Java split()的结果会有所不同

[英]Result of Java split() is varies when working with string of numbers

Why does Java String.split() generate different results when working with string defined in code versus string read from a file when numbers are involved? 为什么Java String.split()在处理代码中定义的字符串与涉及数字时从文件读取的字符串时会产生不同的结果? Specifically I have a file called "test.txt" that contains chars and numbers separated by spaces: 具体来说,我有一个名为“test.txt”的文件,其中包含以空格分隔的字符和数字:

G H 5 4

The split method does not split on spaces as expected. split方法不会按预期在空格上拆分。 But if a string variable is created within code with same chars and numbers separated by spaces then the result of split() is four individual strings, one for char and number. 但是如果在代码中创建一个字符串变量,其中相同的字符和数字用空格分隔,则split()的结果是四个单独的字符串,一个用于char和number。 The code below demonstrates this difference: 下面的代码演示了这种差异:

   import java.io.File;
   import java.io.FileReader;
   import java.io.BufferedReader;

   public class SplitNumber {

     //Read first line of text file
     public static void main(String[] args) {
       try {
         File file = new File("test.txt");
         FileReader fr = new FileReader(file);
         BufferedReader bufferedReader = new BufferedReader(fr);

         String firstLine;
         if ((firstLine = bufferedReader.readLine()) != null) {
           String[] firstLineNumbers = firstLine.split("\\s+");
           System.out.println("First line array length: " + firstLineNumbers.length);

           for (int i=0; i<firstLineNumbers.length; i++) {
             System.out.println(firstLineNumbers[i]);
           }
         }
         bufferedReader.close();

         String numberString = "G H 5 4";
         String[] numbers = numberString.split("\\s+");
         System.out.println("Numbers array length: " + numbers.length);

         for (int i=0; i<numbers.length; i++) {
           System.out.println(numbers[i]);
         }
       } catch(Exception exception) {
         System.out.println("IOException occured");
         exception.printStackTrace();
       }
     }
   }

The result is: 结果是:

First line array length: 3
G
H
5 4
Numbers array length: 4
G
H
5
4

Why do the numbers from the file not get parsed the same as the same string defined within code? 为什么文件中的数字不会被解析为与代码中定义的相同字符串相同?

Based on feedback I changed the regex to split("[\\\\s\\\\h]+") which resolved the issue; 根据反馈,我将正则表达式更改为split("[\\\\s\\\\h]+") ,这解决了问题; the numbers for the file were properly split which clearly indicated that I had a different whitespace-like character in the text file that I was using. 文件的数字被正确分割,这清楚地表明我在我正在使用的文本文件中有一个不同的类似空格的字符。 I then replaced the contents of the file (using notepad) and reverted back to split("\\\\s+") and found that it worked correctly this time. 然后我替换了文件的内容(使用记事本)并恢复为split("\\\\s+")并发现它这次正常工作。 So at some point I must have introduced different white-space like characters in the file (maybe a copy/paste issue). 所以在某些时候我必须在文件中引入不同的空白字符(可能是复制/粘贴问题)。 In the end the take away is I should use split("[\\\\s\\\\h]+") when reading from a file where I want to split on spaces as it will cover more scenarios that may not be immediately obvious. 最后带走的是我应该使用split("[\\\\s\\\\h]+")从我想要在空格上分割的文件中读取,因为它将涵盖可能不会立即明显的更多场景。

Thanks to all for helping me find the root cause of my issue. 感谢所有帮助我找到问题的根本原因。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM