简体   繁体   English

Java replaceAll()和split()异常

[英]Java replaceAll() & split() irregularities

I know, I know, now I have two problems 'n all that, but regex here means I don't have to write two complicated loops. 我知道,我知道,现在我有两个问题,但这里的正则表达式意味着我不必编写两个复杂的循环。 Instead, I have a regex that only I understand, and I'll be employed for yonks. 取而代之的是,我有一个仅能理解的正则表达式,而且我将受雇于yonks。

I have a string, say stack.overflow.questions[0].answer[1].postDate , and I need to get the [0] and the [1], preferably in an array. 我有一个字符串,例如stack.overflow.questions[0].answer[1].postDate ,我需要获取[0]和[1],最好是在数组中。 "Easy!" “简单!” my neurons exclaimed, just use regex and the split method on your input string; 我的神经元大叫,只需在输入字符串上使用正则表达式和split方法; so I came up with this: 所以我想出了这个:

String[] tokens = input.split("[^\\[\\d\\]]");

which produced the following: 产生了以下内容:

[, , , , , , , , , , , , , , , , [0], , , , , , , [1]]

Oh dear. 噢亲爱的。 So, I thought, "what would replaceAll do in this instance?": 因此,我想,“在这种情况下replaceAll做什么?”:

String onlyArrayIndexes = input.replaceAll("[^\\[\\d\\]]", "");

which produced: 产生了:

[0][1]

Hmm. Why so? 为什么这样? I'm looking for a two-element string array that contains "[0]" as the first element and "[1]" as the second. 我正在寻找一个包含两个元素的字符串数组,其中第一个元素包含“ [0]”,第二个元素包含“ [1]”。 Why does split not work here, when the Javadocs declare they both use the Pattern class as per the Javadoc ? 为什么不拆分这里工作,当时的Javadoc声明,它们都使用模式类为每的Javadoc

To summarise, I have two questions: why does the split() call produce that large array with seemingly random space characters and am I right in thinking the replaceAll works because the regex replaces all characters not matching "[", a number and "]" ? 总而言之,我有两个问题: 为什么split()调用会产生带有看似随机的空格字符的大数组, 我是否认为replaceAll有效,因为正则表达式会替换所有不匹配“ [”,数字和“]的字符“? What am I missing that means I expect them to produce similar output (OK that's three, and please don't answer "a clue?" to this one!). 我想念的是什么意思,我希望他们产生相似的输出(可以的是三,请不要对此回答“线索”!)。

well from what I can see the split does work, it gives you an array that holds the string split for each match that is not a set of brackets with a digit in the middle. 从我可以看到split确实有效的角度来看,它为您提供了一个数组,用于保存每个匹配项的字符串拆分,该字符串不是一组中间带有数字的括号。

as for the replaceAll I think your assumption is right. 至于replaceAll我认为您的假设是正确的。 it removes everything (replace the match with "" ) that is not what you want. 它将删除您不想要的所有内容(将匹配项替换为"" )。

From the API documentation : API文档中

Splits this string around matches of the given regular expression. 围绕给定正则表达式的匹配项拆分此字符串。

This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. 该方法的工作方式就像通过调用具有给定表达式且限制参数为零的二参数拆分方法。 Trailing empty strings are therefore not included in the resulting array. 因此,结尾的空字符串不包括在结果数组中。

The string "boo:and:foo", for example, yields the following results with these expressions: 例如,字符串“ boo:and:foo”通过这些表达式产生以下结果:

 Regex Result : { "boo", "and", "foo" } o { "b", "", ":and:f" } 

This is not a direct answer to your question, however I want to show you a great API that will suit your need. 这不是您问题的直接答案,但是我想向您展示一个适合您需求的出色API。

Check out Splitter from Google Guava. 从Google Guava中查看Splitter

So for your example, you would use it like this: 因此,对于您的示例,您将像这样使用它:

Iterable<String> tokens = Splitter.onPattern("[^\\[\\d\\]]").omitEmptyStrings().trimResults().split(input);

//Now you get back an Iterable which you can iterate over. Much better than an Array.
for(String s : tokens) {
   System.out.println(s);
}

This prints: 打印:
0
1

split splits on boundaries defined by the regex you provide, so it's no great surprise you're getting lots of entries — nearly all of the characters in the string match your regex and so, by definition, are boundaries on which a split should occur. split在由您提供的正则表达式定义的边界上进行split ,因此,获得很多条目并不令人惊讶-字符串中几乎所有字符都与您的regex匹配,因此,根据定义,是应该进行拆分的边界。

replaceAll replaces matches for your regex with the replacement you give it, which in your case is a blank string. replaceAll 您提供的替换替换正则表达式的匹配项,在您的情况下为空白字符串。

If you're trying to grab the 0 and the 1 , it's a trivial loop: 如果您尝试获取01 ,那么这是一个琐碎的循环:

String text = "stack.overflow.questions[0].answer[1].postDate";
Pattern pat = Pattern.compile("\\[(\\d+)\\]");
Matcher m = pat.matcher(text);
List<String> results = new ArrayList<String>();
while (m.find()) {
    results.add(m.group(1)); // Or just .group() if you want the [] as well
}
String[] tokens = results.toArray(new String[0]);

Or if it's always exactly two of them: 或者,如果总是恰好是其中两个:

String text = "stack.overflow.questions[0].answer[1].postDate";
Pattern pat = Pattern.compile(".*\\[(\\d+)\\].*\\[(\\d+)\\].*");
Matcher m = pat.matcher(text);
m.find();
String[] tokens = new String[2];
tokens[0] = m.group(1);
tokens[1] = m.group(2);

The problem is that split is the wrong operation here. 问题在于,这里的split操作是错误的。

In ruby, I'd tell you to string.scan(/\\[\\d+\\]/) , which would give you the array ["[0]","[1]"] 在ruby中,我告诉你string.scan(/\\[\\d+\\]/) ,它将为您提供数组["[0]","[1]"]

Java doesn't have a single-method equivalent, but we can write a scan method as follows: Java没有等效的单方法,但是我们可以编写以下scan方法:

public List<String> scan(String string, String regex){
   List<String> list = new ArrayList<String>();
   Pattern pattern = Pattern.compile(regex);
   Matcher matcher = pattern.matcher(string);
   while(matcher.find()) {
      list.add(matcher.group());
   }
   return retval;
}

and we can call it as scan(string,"\\\\[\\\\d+\\\\]") 我们可以将其称为scan(string,"\\\\[\\\\d+\\\\]")

The equivalent Scala code is: 等效的Scala代码为:

"""\[\d+\]""".r findAllIn string

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM