So I'm working with a huge dataset in Java trying to scrub the text of everything but alpha characters. Right now I'm doing this with:
snippet = snippet.toLowerCase();
snippet.replaceAll("[^A-Za-z]", "");
however the sanitization is not going as planned. Some extraneous @
, #
, ?
, and :
are making their way through. Ideas?
In java, Strings are immutable - their value can't be changed. Consequently, replaceAll()
returns the altered String; it doesn't change the String on which it was called.
You must assign the return value back to the variable:
snippet = snippet.replaceAll("[^A-Za-z]", "");
Although this behaviour at first seems "non Object Oriented", when the class is immutable it does make sense.
Also, you don't need the call to .toLowerCase()
- you regex is matching on uppercase letters too.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.