简体   繁体   中英

How do I subtract one character from another in Rust?

In Java, I could do this.

int diff = 'Z' - 'A'; // 25

I have tried the same in Rust:

fn main() {
    'Z' - 'A';
}

but the compiler complains:

error[E0369]: binary operation `-` cannot be applied to type `char`
 --> src/main.rs:2:5
  |
2 |     'Z' - 'A';
  |     ^^^^^^^^^
  |
  = note: an implementation of `std::ops::Sub` might be missing for `char`

How can I do the equivalent operation in Rust?

The operation is meaningless in a Unicode world, and barely ever meaningful in an ASCII world, this is why Rust doesn't provide it directly, but there are two ways to do this depending on your use case:

  • Cast the characters to their scalar value: 'Z' as u32 - 'A' as u32
  • Use byte character literals: b'Z' - b'A'

Math is not meaningless in unicode, that misses the most amazing feature of utf-8.

Any 7bit char with 0 high bit is valid us-ascii, a 7bit us-ascii doc is valid utf-8. You can treat utf-8 as us-ascii bytes provided all comparisons and math deal with values lower than 127. This is by design of utf-8, C code tends to just work, however rust makes this complicated.

Given a string value: &str

Grab the bytes as_bytes()

for byt in value.as_bytes() {
    let mut c = *byt; // c is u8 (unsigned byte)

    // if we are dealing with us-ascii chars...
    if c >= b'A' && c <= b'Z' {
        // math works, this converts to us-ascii lowercase
        c = c + 32;  
    }

    // to treat the u8 as a rust char cast it
    let ch = c as char;
    // now you can write sane code like  
    if ch == ':' || ch == ' ' || ch == '/' {
        ....
    // but you cant do math anymore

This math is not meaningless, +32 is a handy lowercase function for AZ and this is valid treatment of utf-8 chars.

It is not by accident that a + 1 = b in utf-8. Ascii-beticaly ordering may not be the same as real world alphabetical ordering, but it is still useful because it performs well over a common range of characters.

It is not meaningless that '3' + 1 = '4' in ascii.

You will not break strings utf-8 as bytes, simple code like if (c == 'a') will work even if you have smiley poos in the string.

Doing math on Rust's char is impossible, which is a shame.

Doing math on one byte of a utf-8 string is as valid as its ever been.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM