简体   繁体   English

Java - 正则表达式问题

[英]Java - Regex problem

I want to remove ) character from the end of a string through a regex. 我想删除)从通过正则表达式的字符串的结束字符。

Eg If a string is UK(Great Britain) then I want to replace the last ) symbol. 例如,如果一个字符串是英国(英国),那么我想替换最后一个)符号。

Note: 注意:

1). 1)。 The regex should remove only the last ) symbol, doesn't matter how many ) symbols are present in the string. 正则表达式应该只删除最后一个)符号,无所谓多少)的符号存在的的字符串中。

Please don't use a regex for this simple task. 不要使用正则表达式来完成这个简单的任务。

// If the last ) might not be the last character of the String
String s = "Your String with) multiple).";
StringBuilder sb = new StringBuilder(s);
sb.deleteCharAt(s.lastIndexOf(')'));
s = sb.toString(); // s = "Your String with) multiple."

// If the last ) will always be the last character of the String
s = "Your String with))";
if (s.endsWith(")")) 
    s = s.substring(0, s.length() - 1);
// s = "Your String with)"

If only ) at the end of the string is to be removed, then this works: 如果只是)要删除字符串的末尾,那么这适用:

str.replaceFirst("\\)$", "");

This matches exactly what it says: a literal ) (escaped because it's also a regex metacharacter) followed by $ , the end-of-string boundary anchor, and replace it with the empty string, effectively deleting any terminating ) . 这完全匹配它所说的:一个文字) (转义因为它也是正则表达式元字符)后跟$ ,字符串结尾边界锚,并用空字符串替换它,有效地删除任何终止)

If there is no match, it means that there is no ) at the end of the string (even though there may be occurrences elsewhere), and there is no replacement made and the string is unchanged. 如果不匹配,则意味着不存在)在字符串的结尾(即使有可能出现在其他地方),并没有做出更换和字符串是不变的。


If you generally want to remove the last occurrence of ) which may not be at the end of the string, you can use greedy .* matching: 如果你一般要删除的最后出现)这可能不是在字符串的结尾,你可以用贪婪.*匹配:

str.replaceFirst("(.*)\\)", "$1");

Here we have greedy matching .* that captures into \\1 . 在这里,我们有贪婪的匹配.*捕获到\\1 If the whole pattern ever matches, \\1 would've been as long as it possibly can, which means that the literal ) following it would've had to have been the last occurrence (because if there is another occurrence to its right, \\1 could've captured a longer string instead, which is a contradiction). 如果整个模式匹配, \\1将会尽可能长,这意味着跟随它的文字)必须是最后一次出现(因为如果它的右边有另一个事件, \\1可能会捕获更长的字符串,这是一个矛盾)。


Performance 性能

Matching the first regex should be optimizable to a O(1) operation, thanks to the end-of-string $ anchor. 由于字符串结尾的$ anchor,匹配第一个正则表达式应该可以优化为O(1)操作。 The actual replacement will be O(N) , because the new string would have to be copied to a new buffer if there is a match. 实际替换将是O(N) ,因为如果存在匹配,则必须将新字符串复制到新缓冲区。 If there is no match, then it should be optimizable to return the original string, and therefore would've been O(1) overall. 如果没有匹配,那么它应该是可优化的以返回原始字符串,因此整体上都是O(1) This is as optimal as it gets. 这是最佳的。

The second regex needs O(N) to match because of the repetition. 由于重复,第二个正则表达式需要O(N)匹配。 This is no worse than a linear search for the last ) using lastIndexOf , which is also O(N) . 这是不大于最后一个线性搜索更糟)使用lastIndexOf ,这也是O(N)

If you're doing this a lot, then you should know the standard compiled Pattern equivalence of replaceFirst . 如果你这么做很多,那么你应该知道replaceFirst的标准编译Pattern等价。 From the API : 来自API

An invocation of this method of the form 调用此方法的形式

 str.replaceFirst(regex, repl) 

yields exactly the same result as the expression 得到与表达式完全相同的结果

 Pattern.compile(regex).matcher(str).replaceFirst(repl) 

Readability 可读性

"Calling a replaceFirst method that's been hacked to actually replace last is just confusing." “调用被黑客攻击的replaceFirst方法实际上是替换最后的方法,这简直令人困惑。”

It should be pointed here that in fact, you can use replaceAll with these exact patterns and the solution would still work! 这里应该指出,事实上,你可以将replaceAll与这些确切的模式一起使用,解决方案仍然有效! Really you just need a regex replace, and either of replaceAll or replaceFirst it really doesn't matter, the pattern is really that simple ! 真的,你只需要一个正则表达式替换,无论是replaceAll还是replaceFirst都没关系,模式真的那么简单

The needle$ to match at the end of the string and the greedy (.*)needle to match the last occurrence are basic idioms that is very readable and understandable to those who have basic understanding of regex. 在字符串末尾匹配的needle$和与最后一次出现匹配的贪婪(.*)needle是基本习语,对于那些对正则表达式有基本了解的人来说,它们是非常易读和易懂的。 Neither would really qualify as "hacks". 两者都不符合“黑客”的条件。

Using a method called replaceFirst to replace the last occurrence of something may seem misleading at first, but this is shortsighted: it is the first match of the pattern that is replaced; 使用一个名为replaceFirst的方法替换最后一次出现的东西最初可能会产生误导,但这是短视的:它是被替换的模式的第一个匹配 ; what that pattern matches can be anything, be it the sixth "Sense" , or the last "Mohican" ! 这种模式匹配的东西可以是任何东西,无论是第六个"Sense" ,还是最后一个"Mohican"

As an analogy, let's take another simple string manipulation example: delete all "spam" substring from a string. 作为类比,让我们采取另一个简单的字符串操作示例:从字符串中删除所有"spam"子字符串。 I would argue that the most readable solution is to use replace 我认为最可读的解决方案是使用replace

str.replace("spam", "");

"But wait! The name replace is misleading! You're not replacing it with something else! You should call a method called delete or something!" “但是等等!名称replace是误导性的!你不会用其他东西替换它!你应该调用一个名为delete的方法!”

That's silly-talk, of course! 当然,那是愚蠢的谈话! You are indeed replacing it with something else -- the empty string! 你确实用别的东西代替它-空字符串! Its effect is deletion, but the operation is still string replace -ment! 它的效果是删除,但操作仍然是字符串replace

Just like the replaceFirst in my solution: you may want to replace the last occurrence of something, but it's still a first match of the overall pattern! 就像我的解决方案中的replaceFirst一样:你可能想要替换最后一次出现的东西,但它仍然是整个模式的第一个匹配!

Now it's true that a regex pattern out of nowhere will be confusing, but it can be clear from context, eg: 现在,无处不在的正则表达式模式确实会令人困惑,但从上下文中可以清楚地看出,例如:

public static String removeLastCloseParenthesis(String str) {
   return str.replaceFirst("(.*)\\)", "$1");
}

And you can always just name the thing . 而且你总能把这个东西命名 And you can always put comments as/if necessary . 如有必要 ,您可以随时发表评论 These are just general code readability techniques, and therefore applicable to regex just as they do to everything else. 这些只是一般的代码可读性技术,因此适用于正则表达式,就像它们对其他所有内容一样。

If you do want to use a regex (despite that it's doable w/o regex) 如果你想使用正则表达式(尽管它是可行的,没有正则表达式)

String s = /* ... your string here ... */
String parenReplacement = "!!!" // whatever the replacement is
Pattern p = Pattern.compile("^(.*)\\)([^\\)]*)$");
Matcher m = p.matcher(s);
if (m.find())
{
   s = m.group(1)+parenReplacement+m.group(2);
}

Why would you use a regex for that? 你为什么要使用正则表达式呢? Just use String.charAt(...) and substring(...)! 只需使用String.charAt(...)和substring(...)!

You don't really need a regex for this. 你真的不需要正则表达式。 The String class has a lastIndexOf() method that you can use to find the index of the last ) in the String . String类有一个lastIndexOf()方法,您可以使用它来查找String最后一个的索引。 See here . 看到这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM