简体   繁体   中英

String.replaceAll letting some chars through the cracks

So I'm working with a huge dataset in Java trying to scrub the text of everything but alpha characters. Right now I'm doing this with:

snippet = snippet.toLowerCase();
snippet.replaceAll("[^A-Za-z]", "");                

however the sanitization is not going as planned. Some extraneous @ , # , ? , and : are making their way through. Ideas?

In java, Strings are immutable - their value can't be changed. Consequently, replaceAll() returns the altered String; it doesn't change the String on which it was called.

You must assign the return value back to the variable:

snippet = snippet.replaceAll("[^A-Za-z]", "");

Although this behaviour at first seems "non Object Oriented", when the class is immutable it does make sense.

Also, you don't need the call to .toLowerCase() - you regex is matching on uppercase letters too.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM