简体   繁体   中英

Using Java, how is this charAt(); turn a string into an int?

I'm a beginner at Java and I'm trying to understand and explain to myself how this for loop is working. The instructions say it's converting the numeric Unicode equivalent for each letter in each word by using loops.

Based on my understanding the for loop goes through the entire word using the .length() and then stores it as int i, which gets carried down into the parenthesis of i of the charAt. CharAt returns each character in the word and then the int converts it into an int that is stored as finalInt.

So my question is where does unicode number comes from? How does it know that it's unicode?

String word1;
int finalInt; 

for (int i = 0; i < word1.length(); i++) {
    finalInt = (int) word1.charAt(i);

        }

Java Character is based on Unicode

Character information is based on the Unicode Standard, version 6.0.0.

https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html

Besides, char and int can convert each other. please refer to : Convert int to char in java

Check the ASCII table - http://www.asciitable.com/
Your code is transforming a char (last column) into its numerical representation (first column).

Using Java, how is this charAt() turn a string into an int?

The Java String models a string as an array of char (not int ) values. So charAt is just indexing the (conceptual) array. So you cn say that the string is integer values ... representing characters.

(Under the hood, different versions of Java actually use a variety of implementation approaches. In some versions, the actual representation is not a char[] . But that is all hidden from site ... and you can safely ignore it.

So my question is where does unicode number comes from?

It comes from the code that created the String ; ie the code than called new String(...) .

  • If the String is constructed from a char[] , it is assumed that the characters in the array are UTF-16 codeunits in a sequence that is a valid UTF-16 representation.

  • If the String is constructed from a byte[] , the byte sequence is decoded from some specified or implied encoding. If you supply an encoding (eg Charset ) that will be used. Otherwise the application's default encoding is used. Either way, the decoder is responsible for producing valid Unicode.

Sometimes these things break. For instance if your application provides a byte[] encoded in one encoding and tells the String constructor it is a different encoding, you are liable to get nonsense Unicode in the String . Often called mojibake .

How does it know that it's unicode?

String is designed to be Unicode based.

The code that needs to know is the code that is forming the strings from other things. The String class just assumes that it content is meaningful. (At one level ... it doesn't care. You can populate a String with malformed UTF-16 or total nonsense. The String will faithfully record and reproduce the nonsense.)


Having said that, there is an important mistake in your code.

The charAt method does not return a Unicode codepoint. A String is primarily modeled as a sequence of UTF-16 codeunits, and charAt returns those .

Unicode codepoints are actually numbers in the range range 0 hex to 10FFFF hex . That doesn't fit into a char ... which is limited to 0 hex to FFFF hex .

UTF-16 encodes Unicode codepoints into 16 bit codeunits. So, the value returned by charAt represents either an entire Unicode codepoint (for codepoints in the range 0 hex to FFFF hex ) or the top or bottom part of a codepoint (for codepoints larger than FFFF hex ).

If you want String to return (complete) Unicode codepoints, you need to use String.codePointAt . But it is important to read the javadocs carefully to understand how the method should be used. (It may be simpler to use the String.codePoints() method.)

At any rate, what this means is that your code is NOT assigning a Unicode codepoint to finalInt in all cases. It works for Unicode characters in the BMP (code plane zero) but not the higher code planes. It will break for the Unicode codepoints for Emojis, for example.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM