简体   繁体   中英

List of valid ASCII characters for Objective-C literals and identifiers?

These variable names below are ALL VALID in Xcode (compiler builds them without a second thought).

NSString * ª_name = @"something";
NSString * ø_name = @"something";
NSString * ƒ_name = @"something";
NSString * Ç_name = @"something";
NSString * ç_name = @"something";
NSString * º_name = @"something";
NSString * ı_name = @"something";
NSString * ·name = @"SHIFT+OPTION+9"; // Personal favourite
NSString * π_name = @"something";
NSString * æ_name = @"something";

Is there a list somewhere I can see all the valid characters I can use for a variable name?


EDIT: Many of them still fools Xcode indexing actually. So better to stick to good'ol underscore. :)

C90 allows 63 well known ASCII characters in identifiers: Latin upper- and lowercase letters, digits and the underscore.

Since C99 there's the notion of "extended identifiers" which are supported by both clang and gcc in newer versions. The C99 standard contains a list of allowed "universal characters" in Annex D. It's a bit long so I only copy the latin range:

00AA, 00BA, 00C0−00D6, 00D8−00F6, 00F8−01F5, 01FA−0217, 0250−02A8, 1E00−1E9B, 1EA0−1EF9, 207F

Your favorite "middle dot" (which I like as well) is found in the "Special characters":

00B5, 00B7, 02B0−02B8, 02BB, 02BD−02C1, 02D0−02D1, 02E0−02E4, 037A, 0559, 093D, 0B3D, 1FBE, 203F−2040, 2102, 2107, 210A−2113, 2115, 2118−211D, 2124, 2126, 2128, 212A−2131, 2133−2138, 2160−2182, 3005−3007, 3021−3029

Your source code is actually "implementation defined"

So, when you write

NSString * ª_name = @"something";

a particular compiler may choke, and it's not portable anymore, nor guaranteed it will work in the next version.

According C11, §6.4.2 Identifiers

2 An identifier is a sequence of nondigit characters (including the underscore _, the lowercase and uppercase Latin letters, and other characters) and digits, which designates one or more entities as described in 6.2.1. Lowercase and uppercase letters are distinct. There is no specific limit on the maximum length of an identifier.

3 Each universal character name in an identifier shall designate a character whose encoding in ISO/IEC 10646 falls into one of the ranges specified in D.1.71) The initial character shall not be a universal character name designating a character whose encoding falls into one of the ranges specified in D.2. An implementation may allow multibyte characters that are not part of the basic source character set to appear in identifiers; which characters and their correspondence to universal character names is implementation-defined.

(emphasizes mine)

In the GCC documentation, we can read under

11.1 Implementation-defined behavior

  • Identifier characters.

The C and C++ standards allow identifiers to be composed of '_' and the alphanumeric characters. C++ and C99 also allow universal character names, and C99 further permits implementation-defined characters. GCC currently only permits universal character names if -fextended-identifiers is used, because the implementation of universal character names in identifiers is experimental. ...

Note that a Universal character name is defined as

    universal-character-name: \u hex-quad
                              \U hex-quad hex-quad

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM