简体   繁体   中英

How to replace   with   in an html file

I want to replace all the &nbsp; with &#160; in my html file to support XML parser. But I don't want to replace them directly, I'd like to add an entity in <!DOCTYPE > like below:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"[<!ENTITY nbsp "&#160;">]> <html><head></head><body><div>Hello&nbsp;World!</div></body></html>

But when I view the file, there is an extra ]> on the top of the document:

Anyone know how to deal with it?

Thanks!

What you have is a valid way to include an entity declaration in an internal subset. The document is not otherwise valid, though, as you can check with the W3C Markup Validator : the required xmlns attribute on the html element is missing, and so is the required title attribute.

When served as text/html, the document is processed how browsers use to process HTML document, which means among other thing that internal subsets are not recognized; in fact, document type definitions are not read at all – instead, doctype declarations are just taken as magic strings so that some strings trigger “quirks mode”, some don't. The doctype declaration is parsed in a simplistic manner, which makes the first “>” terminate it, so whatever comes after it is taken as character data.

The morale is that entity declarations just don't work with “HTML”, internally or externally, when “HTML” means sending something to a browser and telling (in HTTP headers) it to be text/html – and that's what servers normally tell when they send .html files.

Served as application/xhtml+xml and fixed to conform to XHTML syntax, your approach works on conforming browsers (online demo: http://www.cs.tut.fi/~jkorpela/test/nbsp.xhtml ):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
 [<!ENTITY nbsp "&#160;">]>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>Entity demo</title></head>
<body>
  <div>Hello&nbsp;World!</div>
</body>
</html>

However, IE 8 and earlier don't process HTML when served as application/xhtml+xml (the browser just launches a “Save As” dialog).

The conclusions depend on what you are doing and why (and in which sense) you need to “support XML parser”. It's not really about parsing but about entity declarations. XHTML user agents are not required to understand predefined entities as in HTML (except for those defined in XML), but has this possibility realized somehow? And in general, it is better to convert &nbsp; to actual no-break space characters than to character references.

这里

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"[<!ENTITY nbsp "&#160;">

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM