Basically, I am wondering if there is a handy class or method to filter a String for unwanted characters. The output of the method should be the 'cleaned' String. Ie:
String dirtyString = "This contains spaces which are not allowed"
String result = cleaner.getCleanedString(dirtyString);
Expecting result would be:
"Thiscontainsspaceswhicharenotallowed"
A better example:
String reallyDirty = " this*is#a*&very_dirty&String"
String result = cleaner.getCleanedString(dirtyString);
I expect the result to be:
"thisisaverydirtyString"
Because, i let the cleaner know that ' ', '*', '#', '&' and '_' are dirty characters. I can solve it by using a white/black list array of chars. But I don't want to re-invent the wheel.
I was wondering if there is already such a thing that can 'clean' strings using a regex. Instead of writing this myself.
Addition: If you think cleaning a String could be done differently/better then I'm all ears as well of course
Another addition: - It is not only for spaces, but for any kind of character.
根据您的更新编辑:
dirtyString.replaceAll("[^a-zA-Z0-9]","")
If you're using guava on your project (and if you're not, I believe you should consider it), the CharMatcher class handles this very nicely:
Your first example might be:
result = CharMatcher.WHITESPACE.removeFrom(dirtyString);
while your second might be:
result = CharMatcher.anyOf(" *#&").removeFrom(dirtyString);
// or alternatively
result = CharMatcher.noneOf(" *#&").retainFrom(dirtyString);
or if you want to be more flexible with whitespace (tabs etc), you can combine them rather than writing your own:
CharMatcher illegal = CharMatcher.WHITESPACE.or(CharMatcher.anyOf("*#&"));
result = illegal.removeFrom(dirtyString);
or you might instead specify legal characters, which depending on your requirements might be:
CharMatcher legal = CharMatcher.JAVA_LETTER; // based on Unicode char class
CharMatcher legal = CharMatcher.ASCII.and(CharMatcher.JAVA_LETTER); // only letters which are also ASCII, as your examples
CharMatcher legal = CharMatcher.inRange('a', 'z'); // lowercase only
CharMatcher legal = CharMatcher.inRange('a', 'z').or(CharMatcher.inRange('A', 'Z')); // either case
followed by retainFrom(dirtyString)
as above.
Very nice, powerful API.
使用replaceAll
。
This will do it:
String dirtyString = "This contains spaces which are not allowed";
String result = dirtyString.replaceAll("\\s", "");
and works by replacing all whitespace with 'nothing'.
String resultString = subjectString.replaceAll("\\P{L}+", "");
将用任何东西替换任何非字母字符。
I also prefer the whitelisting-approach. You'll never know what comes around. There seem to be more encodings in than characters. This way you can control it all:
public String convert(String s) {
s = StringUtils.removePattern(s, "[^A-Za-zäöüÄÖÜß?!$,. 0-9\\-\\+\\*\\?=&%\\$§\"\\!\\^#:;,_²³°\\[\\]\\{\\}<>\\|~]'`'");
return s.trim();
}
This contains all german umlauts and french accents and ... you know - just look at your keyboard. I think I picked them all. Feel free to omit special chars like < > to prevent code-injection...
Regex is not the only avenue to your goal. You can get the code point integer number for each character in your string, then filter out those not considered a letter in Unicode .
The String#codePoints
method returns an IntStream
, a stream of int
primitive values, one per character.
The Character
class can tell us if the character assigned to each of those code point numbers in Unicode is considered a letter, as opposed to whitespace , digits, punctuation, and so on.
Those code points passing our test are converted back to a String
by way of the StringBuilder
class.
String input = " this*is#a*&very_dirty&String" ;
String onlyLetters =
input
.codePoints()
.filter(
codePoint -> Character.isLetter( codePoint )
)
.collect(
StringBuilder :: new ,
StringBuilder :: appendCodePoint ,
StringBuilder :: append
)
.toString()
;
See this code run live at Ideone.com .
thisisaverydirtyString
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.