简体   繁体   English

String.replaceAll() 不适用于某些字符串

[英]String.replaceAll() is not working for some strings

I am editing some email that got from tesseract ocr.我正在编辑一些来自 tesseract ocr 的电子邮件。

Here is my code:这是我的代码:

 if (email != null) {
        email = email.replaceAll(" ", "");
        email = email.replaceAll("caneer", "career");
        email = email.replaceAll("canaer", "career");
        email = email.replaceAll("canear", "career");
        email = email.replaceAll("caraer", "career");
        email = email.replaceAll("carear", "career");
        email = email.replace("|", "l");
        email = email.replaceAll("}", "j");
        email = email.replaceAll("j3b", "job");
        email = email.replaceAll("gmaii.com", "gmail.com");
        email = email.replaceAll("hotmaii.com", "hotmail.com");
        email = email.replaceAll(".c0m", ".com");
        email = email.replaceAll(".coin", ".com");
        email = email.replaceAll("consuit", "consult");
    }
    return email;

But the output is not correct.但是输出不正确。

Input :输入 :

amrut=ac.hrworks@g mai|.com

Output :输出 :

lalcl.lhlrlwlolrlklsl@lglmlalil|l.lclolml

But when I assigned the result to a new String after every replacement, it works fine.但是当我在每次替换后将结果分配给一个新的字符串时,它工作正常。 Why continuous assignment in the same String is not working?为什么在同一个 String 中连续赋值不起作用?

You'll note in the Javadoc for String.replaceAll() that the first argument is a regular expression .您会在String.replaceAll()Javadoc 中注意到第一个参数是正则表达式

A period ( . ) has a special meaning there as does a pipe ( | ) as does a curly brace ( } ).句号 ( . ) 和管道 ( | ) 和花括号 ( } ) 都有特殊含义。 You need to escape them all, such as:您需要将它们全部转义,例如:

email = email.replaceAll("gmaii\\.com", "gmail.com");

(Is this Java?) (这是Java吗?)

Note that in Java, replaceAll accepts a regular expression and the dot matches any character.请注意,在 Java 中,replaceAll 接受正则表达式并且点匹配任何字符。 You need to escape the dot or use您需要转义点或使用

somestring.replaceAll(Pattern.quote("gmail.com"), "replacement");

Also note the typo here:还要注意这里的错字:

email = emai.replaceAll("canear", "career");

should be应该

email = email.replaceAll("canear", "career");

You have to escape .你必须逃跑. by \\\\.通过\\\\. like following :像下面这样:

if (email != null) {
    email = email.replaceAll(" ", "");
    email = email.replaceAll("caneer", "career");
    email = email.replaceAll("canaer", "career");
    email = email.replaceAll("canear", "career");
    email = email.replaceAll("caraer", "career");
    email = email.replaceAll("carear", "career");
    email = email.replace("|", "l");
    email = email.replaceAll("}", "j");
    email = email.replaceAll("j3b", "job");
    email = email.replaceAll("gmaii\\.com", "gmail.com");
    email = email.replaceAll("hotmaii\\.com", "hotmail.com");
    email = email.replaceAll("\\.c0m", "com");
    email = email.replaceAll("\\.coin", "com");
    email = email.replaceAll("consuit", "consult");
}
return email;

By realizing that replaceAll() first argument is regex you can make your comparisons much less通过意识到replaceAll()第一个参数是regex您可以减少比较

For example you can check for possible misspellings of the word career by the following regex例如,您可以通过以下regex检查“ career ”一词可能的拼写错误

email = email.replaceAll("ca[n|r][e|a][e|a]r", "career"));

You are using some regex characters.您正在使用一些正则表达式字符。

Please escape them using \\ or by using Pattern.quote method请使用\\或使用Pattern.quote方法转义它们

I think you are not aware that first parameter of replaceAll is regex.我认为您不知道replaceAll第一个参数是正则表达式。

. , | , | , } might be interpreted in a different way from your expectation. , }解释方式可能与您的预期不同。

.   Any character (may or may not match line terminators)

http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html

For space you better use对于空间你更好地使用

\s  A whitespace character: [ \t\n\x0B\f\r]

and escape other special characters with a leading \\\\并使用前导\\\\转义其他特殊字符

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM