简体   繁体   English

从java中的字符串中删除垃圾字符

[英]Remove junk characters from string in java

I have the string like:我有这样的字符串:

TEST FURNITURE-34_TEST>测试家具-34_TEST& ;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#38;amp;amp;#38;amp;#38; GT;

My requirement is to remove all those junk characters from the above string.我的要求是从上述字符串中删除所有这些垃圾字符。 so my expected output will be:所以我的预期输出将是:

TEST FURNITURE-34_TEST TEST FURNITURE-34_TEST

I have tried the below code我试过下面的代码

public static String removeUnPrintableChars(String str) {
    if (str != null) {
        str = str.replaceAll("[^\\x00-\\x7F]", "");
        str = str.replaceAll("[\\p{Cntrl}&&[^\r\n\t]]", "");
        str = str.replaceAll("\\p{C}", "");
        str = str.replaceAll("\\P{Print}", "");
                    
        str = str.substring(0, Math.min(256, str.length()));
        str = str.trim();
        if (str.isEmpty()) {
            str = null;
        }
    }
    return str;
}

But it does nothing.但它什么也不做。 Instead of finding and replacing each character as empty, can anyone please help me with the generic solution to replace those kinds of characters from the string?除了查找每个字符并将其替换为空字符之外,有人可以帮我解决从字符串中替换这些字符的通用解决方案吗?

Simple way to split a string :拆分字符串的简单方法:

public class Trim {
public static void main(String[] args) {
    String myString = "TEST FURNITURE-34_TEST&"
            + "amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;#38;amp;amp;"
            + "#38;amp;#38;gt;";
    String[] parts = myString.split("&");
    String part1 = parts[0];
    System.out.println(parts[0]);
}
}

Link to original thread : How to split a string in Java链接到原始线程: 如何在 Java 中拆分字符串

The sample strings you are presenting (within your post and in comments) are rather ridiculous and in my opinion, whatever is generating them should be burned....twice.您提供的示例字符串(在您的帖子和评论中)相当荒谬,在我看来,无论生成它们,都应该烧毁......两次。

Try the following method on your string(s).在您的字符串上尝试以下方法。 Add whatever you like to have removed from the input string by adding it to the 2D removableItems String Array.通过将输入字符串添加到 2D可移动项目字符串数组,添加您喜欢从输入字符串中删除的任何内容 This 2D array contains preparation strings for the String#replaceAll() method.这个二维数组包含String#replaceAll()方法的准备字符串。 The first element of each row contains a Regular Expression (regex) of a particular string item to replace and the second element of each row contains the string item to replace the found items with.每行的第一个元素包含要替换的特定字符串项的正则表达式(regex),每行的第二个元素包含要替换找到的项的字符串项。

public static String cleanString(String inputString) {
    String[][] removableItems = {
                                 {"(&?amp;){1,}", " "}, 
                                 {"(#38);?", ""}, 
                                 {"gt;", ""}, {"lt;", ""}
                                };
    
    String desiredString = inputString;
    for (int i = 0; i < removableItems.length; i++) {
            desiredString = desiredString.replaceAll(removableItems[i][0], 
                                                     removableItems[i][1]).trim();
    }
    return desiredString;
}

You can use this method.您可以使用此方法。 This is work with marking word boundaries.这是标记单词边界的工作。

    public static String removeUnPrintableChars(String str) {
    if(str != null){
        str = str.replaceAll("(\\b&?\\w+;#?)", "");
    }

    return str;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM