简体   繁体   中英

Check if a String contains encoded characters

Hello I am looking for a way to detect if a string has being encoded

For example

    String name = "Hellä world";
    String encoded = new String(name.getBytes("utf-8"), "iso8859-1");

The output of this encoded variable is:

Hellä world

As you can see there is an A with grave and another symbol. Is there a way to check if the output contains encoded characters?

Sounds like you want to check if a string that was decoded from bytes in latin1 could have been decoded in UTF-8, too. That's easy because illegal byte sequences are replaced by the character \�:

String recoded = new String(encoded.getBytes("iso-8859-1"), "UTF-8");
return recoded.indexOf('\uFFFD') == -1; // No replacement character found

Your question doesn't make sense. A java String is a list of characters. They don't have an encoding until you convert them into bytes, at which point you need to specify one (although you will see a lot of code that uses the platform default, which is what eg String.getBytes() with no argument does).

I suggest you read this http://kunststube.net/encoding/ .

String name = "Hellä world";
String encoded = new String(name.getBytes("utf-8"), "iso8859-1");

This code is just a character corruption bug. You take a UTF-16 string, transcode it to UTF-8, pretend it is ISO-8859-1 and transcode it back to UTF-16, resulting in incorrectly encoded characters.

If I correctly understood your question, this code may help you. The function isEncoded check if its parameter could be encoded as ascii or if it contains non ascii-chars.

public boolean isEncoded(String text){

    Charset charset = Charset.forName("US-ASCII");
    String checked=new String(text.getBytes(charset),charset);
    return !checked.equals(text);

}

@Test
public void testAscii() throws Exception{
    Assert.assertFalse(isEncoded("Hello world"));
}


@Test
public void testNonAscii() throws Exception{
    Assert.assertTrue(isEncoded("Hellä world"));
}

You can also check for other charset changing charset var or moving it to a parameter.

I'm not really sure what are you trying to do or what is your problem.

This line doesn't make any sense:

String encoded = new String(name.getBytes("utf-8"), "iso8859-1");

You are encoding your name into "UTF-8" and then trying to decode as "iso8859-1".

If you what to encode your name as "iso8859-1" just do name.getBytes("iso8859-1") .

Please tell us what is the problem you encountered so that we can help more.

You can check that your string is encoded or not by this code

public boolean isEncoded(String input) {

    char[] charArray = input.toCharArray();
    for (int i = 0, charArrayLength = charArray.length; i < charArrayLength; i++) {
        Character c = charArray[i];
        if (Character.getType(c) == Character.OTHER_LETTER)){
            return true;
        }
    }
    return false;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM