简体   繁体   中英

Remove Emaill address from java string

How can I remove email address from a string? And all other digits and special characters?

Sample String can be

"Hello world my # is 123 mail me @ test@test.com"

Out put string should be

"Hello world my is mail me"

I googled this and found that I can use following regular expressions

"[^A-Za-z0-9\\.\\@_\\-~#]+"

but that example was more to check valid email ids not removing it. I am new to java!

As pointed out by others, you could use regular expressions to clean up your String and replace unwanted part by an empty string "" . To do so, have a look at the replaceAll(String regex, String replacement) method of the String class and at the Pattern class for the syntax of regular expressions in Java.

Below, some code demonstrating one way to clean the provided sample String (maybe not the most elegant though):

String input = "Hello world my # is 123 mail me @ test@test.com";
String EMAIL_PATTERN = "([^.@\\s]+)(\\.[^.@\\s]+)*@([^.@\\s]+\\.)+([^.@\\s]+)";

String output = input.replaceAll(EMAIL_PATTERN, "") // Replace emails 
                                                    // by an empty string
        .replaceAll("\\p{Punct}", "") // Replace all punctuation. One of
                                      // !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
        .replaceAll("\\d", "") // Replace any digit by an empty string
        .replaceAll("\\p{Blank}{2,}+", " "); // Replace any Blank (a  space or 
                                             // a tab) repeated more than once
                                             // by a single space.

System.out.println(output);

Running this code produces the following output:

Hello world my is mail me 

If you need to remove more garbage (or less, like punctuation), well, you've got the principle. Adapt it to suit your needs.

You can use String#replaceAll() for this. Just let it replace any regex matches by an empty string "" . The regex you mentioned is however not very robust. A better one is this ( copied from here and slightly changed for use in plain vanilla text):

string = string.replaceAll("([^.@\\s]+)(\\.[^.@\\s]+)*@([^.@\\s]+\\.)+([^.@\\s]+)", "");

Hope this helps.

Check out the Java regular expression Pattern class and its uses. There's a useful tutorial here which includes replacement methods .

An aside: this is a particularly robust regexp to use for RFC822-compliant email addresses :-) You should be able to come up with something more concise for your needs! There's a discussion of email regexps and trade-offs here .

From your example, it looks like it's not just email addresses you're interested in removing, it's all non-alpha characters, so this is trivial:

str = str.replaceAll("([^.@\\s]+)(\\.[^.@\\s]+)*@([^.@\\s]+\\.)+([^.@\\s]+)", "")
         .replaceAll("[^\\p{Alpha} ]", "")
         .replaceAll("[ ]{2,}+", " ");

See the Pattern JavaDocs for information about what the special character class \\p{Alpha} means...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM