简体   繁体   中英

Spaces required between keyword and literal

Looking at the output of UglifyJS2, I noticed that no spaces are required between literals and the in operator (eg, 'foo'in{foo:'bar'} is valid).

Playing around with Chrome's DevTools, however, I noticed that hex and binary number literals require a space before the in keyword:

在此输入图像描述

Internet explorer returned true to all three tests, while FireFox 48.0.1 threw a SyntaxError for the first one ( 1in foo ), however it is okay with string literals ( '1'in foo==true ).

It seems that there should be no problem parsing JavaScript, allowing for keywords to be next to numeric literals, but I can't find any explicit rule in the ECMAScript specification (any of them).

Further testing shows that statements like for(var i of[1,2,3])... are allowed in both Chrome and FireFox (IE11 doesn't support for..of loops), and typeof"string" works in all three.

Which behavior is correct? Is it, in fact, defined somewhere that I missed, or are all these effects a result of idiosyncrasies of each browser's parser?

Not an expert - I haven't done a JS compiler, but have done others.

ecma-262.pdf is a bit vague, but it's clear that an expression such as 1 in foo should be parsed as 3 input elements, which are all tokens. Each token is a CommonToken (11.5); in this case, we get numericLiteral , identifierName (yes, in is an identifierName ), and identifierName . Exactly the same is true when parsing 0b1 in foo (see 11.8.3).

So, what happens when you take out the WS? It's not covered explicitly (as far as I can see), but it's common practice (in other languages) when writing a lexer to scan the longest character sequence that will match something you could potentially be looking for. The introduction to section 11 pretty much says exactly that:

The source text is scanned from left to right, repeatedly taking the longest possible sequence of code points as the next input element.

So, for 0b1in foo the lexer goes through 0b1 , which matches a numeric literal, and reaches i , giving 0b1i , which doesn't match anything. So it passes the longest match ( 0b1 ) to the rest of the parser as a token, and starts again at i . It finds n , followed by WS, so passes in as the second token, and so on.

So, basically, and rather bizarrely, it looks like IE is correct.

TL;DR There would be no change to how code would be interpreted if whitespace weren't required in these circumstances, but it's part of the spec.

Looking at the source code of v8 that handles number literal parsing , it cites ECMA 262 § 7.8.3:

The source character immediately following a NumericLiteral must not be an IdentifierStart or DecimalDigit.

NOTE For example:

3in

is an error and not the two input elements 3 and in .

This section seems to contradict the introduction of section 7. However, it does not seem that there would be any problems with breaking that rule and allowing for 3in to be parsed. There are cases where allowing for no spaces between literals and identifiers would change how the source is parsed, but all cases merely change which errors are generated.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM