简体   繁体   English

在Java中解析String的有效方法是什么?

[英]What is an efficient way to parse a String in Java?

How should I parse the following String using Java to extract the file path? 我应该如何使用Java解析以下String来提取文件路径?

? stands for any number of random charaters 代表任意数量的随机字符

_ stands for any number of white spaces (no new line) _代表任意数量的空格(无新行)

?[LoadFile]_file_=_"foo/bar/baz.xml"?

Example: 例:

10:52:21.212 [LoadFile] file = "foo/bar/baz.xml"

should extract foo/bar/baz.xml 应该提取foo/bar/baz.xml

String regex = ".*\\[LoadFile\\]\\s+file\\s+=\\s+\"([^\"].+)\".*";

Matcher m = Pattern.compile(regex).matcher(inputString);
if (!m.find()) 
    System.out.println("No match found.");
else
    String result = m.group(1);

The String in result should be your file path. 结果中的String应该是您的文件路径。 ( assuming I didn't make any mistakes ) 假设我没有犯任何错误

You should take a look at the Pattern class for some regular expression help. 您应该查看Pattern类以获取一些正则表达式帮助。 They can be a very powerful string manipulation tool. 它们可以是一个非常强大的字符串操作工具。

Short answer: use subSequence() . 简答:使用subSequence()

if (line.contains("[LoadFile]"))
  result = line.subSequence(line.indexOf('"'), line.lastIndexOf('"')).toString();

On my machine, this consistently takes less than 10,000 ns. 在我的机器上,这始终不到10,000 ns。

I am taking "efficient" to mean faster. 我采取“有效”意味着更快。

The regex option is considerably slower (about 9 or 10 times slower). 正则表达式选项相当慢(大约慢9或10倍)。 The primary advantage of the regex option is that it might be easier for another programmer to figure out what you are doing (but then use comments to help them). 正则表达式选项的主要优点是,另一个程序员可能更容易弄清楚你在做什么(但随后使用注释来帮助他们)。

To make the regex option more efficient, pre-compile it: 要使regex选项更有效,请对其进行预编译:

private static final String FILE_REGEX = ".*\\[LoadFile\\]\\s+file\\s+=\\s+\"([^\"].+)\".*";
private static final Pattern FILE_PATTERN = Pattern.compile(FILE_REGEX);

But this still leaves it slower. 但这仍然让它变慢。 I record times between 80,000 and 100,000 ns. 我记录的时间在80,000到100,000 ns之间。

The StringTokenizer option is more efficient than the regex: StringTokenizer选项比正则表达式更有效:

if (line.contains("[LoadFile]")) {
  StringTokenizer tokenizer = new StringTokenizer(line, "\"");
  tokenizer.nextToken();
  result = tokenizer.nextToken();
}

This hovers around 40,000 ns for me, putting it in at 2-3 times faster than the regex. 这对我来说徘徊在40,000 ns左右,比正则表达式快2-3倍。

In this scenario, split() is also an option, which for me (using Java 6_13) is just a little faster than the Tokenizer: 在这种情况下,split()也是一个选项,对我来说(使用Java 6_13)比Tokenizer快一点:

if (line.contains("[LoadFile]")) {
  String[] values = line.split("\"");
  result = values[1];
}

This averages times of 35,000 ns for me. 这对我来说平均为35,000 ns。

Of course, none of this is checking for errors. 当然,这些都不是检查错误。 Each option will get a little slower when you start factoring that in, but I think the subSequnce() option will still beat them all. 当你开始考虑这个选项时,每个选项都会慢一点,但我认为subSequnce()选项仍会击败它们。 You have to know the exact parameters and expectations to figure out how fault-tolerant each option needs to be. 您必须知道确切的参数和期望,以确定每个选项需要的容错能力。

While regular expressions are nice and all, you can also use class java.util.StringTokenizer to do the job. 虽然正则表达式很好,但您也可以使用类java.util.StringTokenizer来完成这项工作。 The advantage is a more human-friendly code. 优点是更人性化的代码。

StringTokenizer tokenizer = new StringTokenizer(inputString, "\"");
tokenizer.nextElement();
String path = tokenizer.nextElement();

And there you go. 你去吧

java.util.regex是你的朋友。

You could make the regular expression a bit shorter than jinguy's. 你可以使正则表达式比jinguy更短。 Basically just the RHS without the "'s. 基本上只是没有“s”的RHS。

    String regex = ".* = \"(.*)\"";

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM