简体   繁体   中英

Remove junk characters from string in java

I have the string like:

TEST FURNITURE-34_TEST>

My requirement is to remove all those junk characters from the above string. so my expected output will be:

TEST FURNITURE-34_TEST

I have tried the below code

public static String removeUnPrintableChars(String str) {
    if (str != null) {
        str = str.replaceAll("[^\\x00-\\x7F]", "");
        str = str.replaceAll("[\\p{Cntrl}&&[^\r\n\t]]", "");
        str = str.replaceAll("\\p{C}", "");
        str = str.replaceAll("\\P{Print}", "");
                    
        str = str.substring(0, Math.min(256, str.length()));
        str = str.trim();
        if (str.isEmpty()) {
            str = null;
        }
    }
    return str;
}

But it does nothing. Instead of finding and replacing each character as empty, can anyone please help me with the generic solution to replace those kinds of characters from the string?

Simple way to split a string :

public class Trim {
public static void main(String[] args) {
    String myString = "TEST FURNITURE-34_TEST&"
            + "amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#38;amp;amp;"
            + "#38;amp;#38;gt;";
    String[] parts = myString.split("&");
    String part1 = parts[0];
    System.out.println(parts[0]);
}
}

Link to original thread : How to split a string in Java

The sample strings you are presenting (within your post and in comments) are rather ridiculous and in my opinion, whatever is generating them should be burned....twice.

Try the following method on your string(s). Add whatever you like to have removed from the input string by adding it to the 2D removableItems String Array. This 2D array contains preparation strings for the String#replaceAll() method. The first element of each row contains a Regular Expression (regex) of a particular string item to replace and the second element of each row contains the string item to replace the found items with.

public static String cleanString(String inputString) {
    String[][] removableItems = {
                                 {"(&?amp;){1,}", " "}, 
                                 {"(#38);?", ""}, 
                                 {"gt;", ""}, {"lt;", ""}
                                };
    
    String desiredString = inputString;
    for (int i = 0; i < removableItems.length; i++) {
            desiredString = desiredString.replaceAll(removableItems[i][0], 
                                                     removableItems[i][1]).trim();
    }
    return desiredString;
}

You can use this method. This is work with marking word boundaries.

    public static String removeUnPrintableChars(String str) {
    if(str != null){
        str = str.replaceAll("(\\b&?\\w+;#?)", "");
    }

    return str;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM