简体   繁体   中英

Removing certain characters from a string

I am thinking about using String.replaceAll() to remove certain characters in my string. It is unclear which characters are going to be removed (ie which characters I want to remove), but I would assume that any character is valid (like [a-zA-Z] and things like $%! , etc).

I came across http://www.java-tips.org/java-se-tips/java.lang/strip-certain-characters-from-a-string.html but surely there is a better way than iterating over each character...

Any thoughts on this?

Thanks

EXAMPLE:

Just to clarify, I will have strings of varying lengths. I want to strip characters from it, the exact ones to be determined at runtime, and return the resulting string.

Taking the paragraph above and allowing me to strip out the " ,. ", I would return the string:

Just to clarify I will have strings of varying lengths I want to strip characters from it the exact ones to be determined at runtime and return the resulting string

As an aside, I know that replaceAll() uses regular expressions, so if I wanted to strip out the characters "$,.", I would need to escape them too, right?

You might want to start by specifying which character you WANT to keep, try something like:

"mystring".replaceAll("[^a-zA-Z]", "")​

To only keep letters.

I guess, the below code will help you.

    String input = "Just to clarify, I will have strings of varying "
      + "lengths. I want to strip characters from it, the exact "
      + "ones to be determined at runtime, and return the "
      + "resulting string.";
    String regx = ",.";
    char[] ca = regx.toCharArray();
    for (char c : ca) {
        input = input.replace(""+c, "");
    }
    System.out.println(input);

This is one of those cases where regular expressions are probably not a good idea. You're going to end up writing more special code to get around regex than if you just take the simple approach and iterate over the characters. You also risk overlooking some cases that might surface as a bug later.

If you're concerned about performance, regex is actually going to be much slower. If you look through the code or profile the use of it, regex has to create a pattern to parse/compile, run through the matching logic and then apply your replacement. All of that creates a lot of objects, which can be expensive if you iterate on this frequently enough.

I'd implement what you found on that link a little differently though. You can save on unnecessary String allocations as it builds the result without any additional complexity:

public static String stripChars(String input, String strip) {
    StringBuilder result = new StringBuilder();
    for (char c : input.toCharArray()) {
        if (strip.indexOf(c) == -1) {
            result.append(c);
        }
    }
    return result.toString();
}

If you're already using the library, Guava makes this easy with CharMatcher

String charsToRemove = "%^#";
String stringToFilter = "I have 20% of my assets in #2 pencils! :^)";

String filtered = CharMatcher.anyOf(charsToRemove).removeFrom(stringToFilter);

I think this can be done by using regular expressions.

Firstly, we know [a-zA-Z] and $%! is valid for characters in string. So we use regx "[^a-zA-Z0-9$%!]" to strip out the other invalid chars. check http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html for detail info of JAVA patten.

Next, we can use mystring.replaceAll(String regex, String replacement)

PS RefexPlanet online Regular Expression Test Page

The Guava method is interesting, though I'm not sure why they use the "spread" variable. Since they use that, a subtraction operation is needed for each shift. I benchmarked a few versions (including a simple hand coded shifter) and you can find the writeup here :

http://thushw.blogspot.com/2013/06/java-remove-specified-characters-from.html

I think you are looking for a code like this to solve your problem without any looping :

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class StripChars {
    public static void main(String[] args) {
    // prints: Just to clarify I will have strings of varying lengths   
    System.out.println(
     replace("Just to clarify, I will have strings of varying lengths.",
               ",."));

    // prints: Solution to my problem on Stackoverflow will cost me 0
    System.out.println(
     replace("Solution to my problem on stackoverflow will cost me $0.", 
               ".$"));      
    }

    static String replace(String line, String charsToBeReplaced) {
        Pattern p = Pattern.compile("(.{1})");
        Matcher m = p.matcher(charsToBeReplaced);
        return line.replaceAll(m.replaceAll("\\\\$1\\|"), "");
    }
}

To take care of special regex characters (meta-characters) in input replace method is first putting \\ (backslash) before each character and a | (pipe) after each character in your input. So an input of ",." will become "\\\\,|\\\\.|"

Once that is done then replacement is pretty simple: for every matching char replace it by a blank.

Not used in this solution but here is the pattern to detect presence of ANY special regex character in Java:

Pattern metachars = Pattern.compile(
   "^.*?(\\(|\\[|\\{|\\^|\\-|\\$|\\||\\]|\\}|\\)|\\?|\\*|\\+|\\.).*?$");

I guess the example code on your link is good enough which you can add other valid characters of your choice. But you can minimize the code using regular expression. Take a look at the code of Abdullah, or see more link1 , link2 , link3 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM