简体   繁体   中英

Valid and Invalid HTML tags

So recently I found a question that was Which of the following is a valid tag?

  1. <213person>
  2. <_person> (This is given as the right answer)
  3. Both
  4. None

(Note: this is the explanation that was given:- Valid HTML tags are surrounded by the angle brackets and the tag name can only either start from an alphabet or an underscore(_))

As far as my knowledge goes none of the reserved tags start with an underscore and according to what I've read about custom HTML tags it has to start with an alphabet(I tested it and it doesn't work with a custom tag starting with any character that's not an alphabet). So in my opinion and according to what I tested HTML tags can only start with alphabets or! (in case of !-- -- and !DOCTYPE HTML)

What I want to know is if the given explanation is correct or not and if it's correct then can someone provide some proper documentation and working examples for it?

As mentioned by @Rob, the standard defines a valid tag name as string containing alphanumeric ASCII characters, being:

0-9|a-z|A-Z

However, browsers handle things differently.

There's a few main points that I've noticed which don't align with the current standard .

Tag names must start with a letter

If a tag name starts with any character outside az|AZ , the start tag ends up being interpreted as text and the end tag gets converted into a comment.

Special characters can be used

The following HTML is valid in a lot of browsers and will create an element:

<Z[\]^_`a></Z[\]^_`a>

This seems to be browsers only checking if the characters are ASCII. The only exception is the first character (as stated above).

Initially, I thought this was a simplified check, so instead of [AZ]|[az| they checked [Az] , but you can use any character outside this range.

This makes the following HTML also "valid" in the eyes of certain browsers:

<a!></a!>
<aʬ></aʬ>
<a͢͢͢></a͢͢͢>
<a͢͢͢ʬ͢ʬ͢ʬ͢ʬ͢ʬ͢ʬ͢ΘΘΘΘ></a͢͢͢ʬ͢ʬ͢ʬ͢ʬ͢ʬ͢ʬ͢ΘΘΘΘ>
<a></a>

I tested the HTML elements in both Chrome and Firefox, I didn't test any other browsers. I also didn't test every ASCII character, just some very high and low in terms of their character code.

From the HTML standard :

Start tags must have the following format:

The first character of a start tag must be a U+003C LESS-THAN SIGN character (<). The next few characters of a start tag must be the element's tag name.

So what is allowed in the element's tag name? This is defined just above:

Tags contain a tag name, giving the element's name. HTML elements all have names that only use ASCII alphanumerics. In the HTML syntax, tag names, even those for foreign elements, may be written with any mix of lower- and uppercase letters that, when converted to all-lowercase, matches the element's tag name; tag names are case-insensitive.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM