简体   繁体   中英

unable to obtain ASCII code in java

I'm trying to obtain the numeric values of ASCII characters as mentioned in http://www.ascii-code.com/

String str = "™æ‡©Æ";
for(int i = 0; i < str.length() ; i++) {
    char c = str.charAt(i);
    int code = (int) c;
    System.out.println(c + ":" +code);
}

Output:

™:8482
æ:230
‡:8225
©:169
Æ:198

My question is: Why the values of '™' and '‡' is not '153' and '135' respectively? and How can I obtain those values, if possible?

The characters which are having an ASCII value more than 128 are not ASCII characters rather it would be better to say them Unicode characters. Also Extended ASCII is not ASCII. You may better refer Unicode tables.

Also to mention that Java uses Unicode internally. And it does not use ASCII internally. Actually, it uses UTF-16 most of the time

You may refer this and List of Unicode characters .

ASCII assigns values only to 128 characters (az, AZ, 0-9, space, some punctuation, and some control characters). The first 128 Unicode code points are the same as ASCII.

Unicode is a computing industry standard designed to consistently and uniquely encode characters used in written languages throughout the world. The Unicode standard uses hexadecimal to express a character.

There are two common formats for Unicode, UTF-8 which uses 1-4 bytes for each value (so for the first 128 characters, UTF-8 is exactly the same as ASCII) and UTF-16, which uses 2 or 4 bytes.

While I did not look into Javadocs for a converter, I did create an example to show why ASCII and Java Unicode are not easily compatible. What I have here will convert the Unicode character into a byte array and then to a string representing the byte array. I would suggest that rather than using a Java class, create an array of the ASCII equivalent and reference the array for output.

  public void showChars() { char c = ' '; int end = 8192; for(int i=0;i<end;++i) { try { c = (char) i; byte[] data = Character.toString((char) i).getBytes("UTF8"); String byteStr = Arrays.toString(data); System.out.println("" + i + " char is " + c + " or " + byteStr); } catch (UnsupportedEncodingException ex) { Logger.getLogger(Dinker.class.getName()).log(Level.SEVERE, null, ex); } } } 

For the sake of answering the second question that was asked:

final String str = "™æ‡©Æ";

final byte[] cp1252Bytes = str.getBytes("windows-1252");
for (final byte b: cp1252Bytes) {
    final int code = b & 0xFF;
    System.out.println(code);
}

Associating the code with each text element is more work.

final String str = "™æ‡©Æ";

final int length = str.length();
for (int offset = 0; offset < length; ) {
    final int codepoint = str.codePointAt(offset);
    final int codepointLength = Character.charCount(codepoint);
    final String codepointString = str.substring(offset, offset + codepointLength);
    System.out.println(codepointString);
    final byte[] cp1252Bytes = codepointString.getBytes("windows-1252");
    for(final byte code : cp1252Bytes) {
        System.out.println(code  & 0xFF);
    }
    offset += codepointLength;
}    

This is somewhat easier Java 8's String.codePoints() method:

final String str = "™æ‡©Æ";

str.codePoints()
    .mapToObj(i ->  new String(Character.toChars(i)))
    .forEach(c -> { 
        try {
            System.out.println(
                String.format("%s %s", 
                    c, 
                    unsignedBytesToString(c.getBytes("Windows-1252"))));
        } catch (Exception e) {
            e.printStackTrace();
        }
    });

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM