[英]How do I use System.getProperty("line.separator").toString()?
I have a Tab-delimited String (representing a table) that is passed to my method.我有一个传递给我的方法的制表符分隔的字符串(代表一个表)。 When I print it to the command line, it appears like a table with rows:当我将它打印到命令行时,它看起来像一个带有行的表:
https://i.stack.imgur.com/2fAyq.gif https://i.stack.imgur.com/2fAyq.gif
The command window is correctly buffered.命令窗口已正确缓冲。 My thinking is that there is definitely a new line character before or after each row.我的想法是每行之前或之后肯定有一个换行符。
My problem is that I want to split up the incoming string into individual strings representing the rows of the table.我的问题是我想将传入的字符串拆分为代表表行的单个字符串。 So far I have:到目前为止,我有:
private static final String newLine = System.getProperty("line.separator").toString();
private static final String tab = "\t";
private static String[] rows;
...
rows = tabDelimitedTable.split(newLine); //problem is here
System.out.println();
System.out.println("################### start debug ####################");
System.out.println((tabDelimitedTable.contains(newLine)) ? "True" : "False");
System.out.println("#################### end debug###################");
System.out.println();
output:输出:
################### start debug ####################
False
#################### end debug###################
Obviously there is something in the string telling the OS to start a new line.显然,字符串中有一些东西告诉操作系统开始一个新行。 Yet it apparently contains no newline characters.然而它显然不包含换行符。
Running the latest JDK on Windows XP SP3.在 Windows XP SP3 上运行最新的 JDK。
Any Ideas?有任何想法吗?
You must NOT assume that an arbitrary input text file uses the "correct" platform-specific newline separator.您不能假设任意输入文本文件使用“正确的”特定于平台的换行符。 This seems to be the source of your problem;这似乎是您问题的根源; it has little to do with regex.它与正则表达式关系不大。
To illustrate, on the Windows platform, System.getProperty("line.separator")
is "\\r\\n"
(CR+LF).举例说明,在 Windows 平台上, System.getProperty("line.separator")
为"\\r\\n"
(CR+LF)。 However, when you run your Java code on this platform, you may very well have to deal with an input file whose line separator is simply "\\n"
(LF).然而,当你在这个平台上运行你的 Java 代码时,你很可能不得不处理一个行分隔符只是"\\n"
(LF)的输入文件。 Maybe this file was originally created in Unix platform, and then transferred in binary (instead of text) mode to Windows.也许这个文件最初是在 Unix 平台上创建的,然后以二进制(而不是文本)模式传输到 Windows。 There could be many scenarios where you may run into these kinds of situations, where you must parse a text file as input which does not use the current platform's newline separator.在许多情况下,您可能会遇到这些情况,您必须将文本文件解析为不使用当前平台的换行符的输入。
(Coincidentally, when a Windows text file is transferred to Unix in binary mode, many editors would display ^M
which confused some people who didn't understand what was going on). (巧合的是,当 Windows 文本文件以二进制模式传输到 Unix 时,许多编辑器会显示^M
,这让一些不明白发生了什么的人感到困惑)。
When you are producing a text file as output, you should probably prefer the platform-specific newline separator, but when you are consuming a text file as input, it's probably not safe to make the assumption that it correctly uses the platform specific newline separator.当您生成文本文件作为输出时,您可能应该更喜欢特定于平台的换行符,但是当您使用文本文件作为输入时,假设它正确使用特定于平台的换行符可能并不安全。
One way to solve the problem is to use eg java.util.Scanner
.解决问题的一种方法是使用例如java.util.Scanner
。 It has a nextLine()
method that can return the next line (if one exists), correctly handling any inconsistency between the platform's newline separator and the input text file.它有一个nextLine()
方法,可以返回下一行(如果存在),正确处理平台的换行符和输入文本文件之间的任何不一致。
You can also combine 2 Scanner
, one to scan the file line by line, and another to scan the tokens of each line.您还可以组合 2 Scanner
,一个是逐行扫描文件,另一个是扫描每行的标记。 Here's a simple usage example that breaks each line into aList<String>
.这是一个简单的用法示例,它将每一行分成一个List<String>
。 The entire file therefore becomes a List<List<String>>
.因此整个文件变成了一个List<List<String>>
。
This is probably a better approach than reading the entire file into one huge String
and then split
into lines (which are then split
into parts).这可能比将整个文件读入一个巨大的String
然后split
成行(然后split
成部分)更好的方法。
String text
= "row1\tblah\tblah\tblah\n"
+ "row2\t1\t2\t3\t4\r\n"
+ "row3\tA\tB\tC\r"
+ "row4";
System.out.println(text);
// row1 blah blah blah
// row2 1 2 3 4
// row3 A B C
// row4
List<List<String>> input = new ArrayList<List<String>>();
Scanner sc = new Scanner(text);
while (sc.hasNextLine()) {
Scanner lineSc = new Scanner(sc.nextLine()).useDelimiter("\t");
List<String> line = new ArrayList<String>();
while (lineSc.hasNext()) {
line.add(lineSc.next());
}
input.add(line);
}
System.out.println(input);
// [[row1, blah, blah, blah], [row2, 1, 2, 3, 4], [row3, A, B, C], [row4]]
java.util.Scanner
- has many examples of usage 使用java.util.Scanner
验证输入- 有很多使用示例Try尝试
rows = tabDelimitedTable.split("[" + newLine + "]");
This should solve the regex problem.这应该可以解决正则表达式问题。
Also not that important but return type of也不是那么重要,但返回类型
System.getProperty("line.separator")
is String so no need to call toString().是字符串所以不需要调用 toString()。
On Windows, line.separator is a CR/LF combination (reference here ).在 Windows 上, line.separator 是 CR/LF 组合(参考此处)。
The Java String.split()
method takes a regular expression . Java String.split()
方法采用正则表达式。 So I think there's some confusion here.所以我认为这里有些混乱。
Try BufferedReader.readLine()
instead of all this complication.试试BufferedReader.readLine()
而不是所有这些复杂的事情。 It will recognize all possible line terminators.它将识别所有可能的行终止符。
I think your problem is that String.split()
treats its argument as a regex, and regexes treat newlines specially.我认为您的问题是String.split()
将其参数视为正则表达式,而正则表达式则专门处理换行符。 You may need to explicitly create a regex object to pass to split()
(there is another overload of it) and configure that regex to allow newlines by passing MULTILINE
in the flags param of Pattern.compile()
.您可能需要显式创建一个正则表达式对象以传递给split()
(它还有另一个重载)并通过在Pattern.compile()
的标志参数中传递MULTILINE
来配置该正则表达式以允许换行。 Docs 文档
The other responders are correct that split() takes a regex as the argument, so you'll have to fix that first.其他响应者认为 split() 将正则表达式作为参数是正确的,因此您必须先解决这个问题。 The other problem is that you're assuming that the line break characters are the same as the system default.另一个问题是您假设换行符与系统默认值相同。 Depending on where the data is coming from, and where the program is running, this assumption may not be correct.根据数据来自何处以及程序在何处运行,此假设可能不正确。
Try this:尝试这个:
rows = tabDelimitedTable.split("[\\r\\n]+");
This should work regardless of what line delimiters are in the input, and will ignore blank lines.无论输入中的行分隔符如何,这都应该有效,并且将忽略空行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.