Javascript或正则表达式解决方案使标记XHTML兼容

Question

I have an inline markup editor built into my website, which should produce XHTML compliant markup. 我的网站内置了一个内联标记编辑器，该编辑器应生成符合XHTML的标记。 But as you can see, it uses the deprecated font tag and size attribute. 但是如您所见，它使用了不推荐使用的font标签和size属性。

<font style="font-family: Courier New; color: rgb(0, 0, 153);" size="2">
   asdfa
   <span style="color: rgb(0, 51, 0);">
    a
    <font size="5">fds</font>
   </span>
</font>

On other browsers, it produces the  instead of  在其他浏览器上，它会生成而不是

Is there a Javascript/Regex solution to taking the first set of markup and replacing it with XHTML compliant markup using style attribute and span tag. 是否有Javascript / Regex解决方案来采用第一套标记，并使用style属性和span标签将其替换为符合XHTML的标记。 Thanks in advance!! 提前致谢！！

(ps. jQuery can be used too) （ps。jQuery也可以使用）

Answer 1

The markup above is perfectly valid in XHTML 1.0 Transitional. 上面的标记在XHTML 1.0 Transitional中完全有效。

Whether deprecated elements like  are used are a completely orthogonal issue to whether XHTML or HTML syntax is used. 是否使用诸如类的已弃用元素与使用XHTML或HTML语法是一个完全正交的问题。 XHTML 1.0 is nothing more or less than a restating of HTML 4.01 in XML syntax: consequently there are Transitional and Strict variants just as there are for HTML 4. XHTML 1.0只不过是用XML语法重述了HTML 4.01：因此，与HTML 4一样，存在Transitional和Strict变体。

 and  are semantically equally useless. 和在语义上同样无用。 If you want markup to use a set of defined elements and classes that are meaningful in the context of your site, you'll have to hack the editor into using those, instead of being based purely on visual formatting. 如果您想让标记使用一组在您的网站上下文中有意义的定义的元素和类，则必须让编辑器使用这些元素和类，而不是纯粹基于视觉格式。

You could parse the XHTML and alter it as a later step, to try to make it look better. 您可以解析XHTML并在以后的步骤中对其进行更改，以使其看起来更好。 But regex is not at all an adequate tool to do so, as previously mentioned. 但是如前所述，正则表达式根本不是一个足够的工具。 You would need an XML parser, then you'd fix up the elements and attributes, then re-serialise it to XHTML. 您将需要一个XML解析器，然后修复元素和属性，然后将其重新序列化为XHTML。 It would be sensible to do this on the server-side, because getting an XML parser on the client-side is slightly tricky, and you will need to do it on the server side anyway if you're going to be cleaning non-whitelisted elements and attributes. 在服务器端执行此操作是明智的，因为在客户端获取XML解析器有些棘手，并且如果要清除非白名单，则无论如何都需要在服务器端执行此操作元素和属性。

Answer 2

I wouldn't recommend REGEX for that sort of job. 对于那种工作，我不建议使用REGEX。 (see: the greatest ' Regex to Parse HTML ' answer ever!) I know, you're not talking about a full-on parser, but I still think you'd be best off with JavaScript (or which ever back-end language you're using) and a library tailored to parsing html. （请参阅：有史以来最出色的' Regex to Parse HTML '答案！）我知道，您并不是在谈论完整的解析器，但我仍然认为您最好使用JavaScript（或使用哪种后端语言）您正在使用）和专门用于解析html的库。

You may want to look at the Tidy open source project over on Sourceforge. 您可能需要在Sourceforge上查看Tidy开源项目。 There's an intro/overview at IBM: " Convert from HTML to XML with HTML Tidy ". 在IBM有一个介绍/概述：“ 使用HTML Tidy从HTML转换为XML ”。

Answer 3

如果可以选择在应用程序中实现其他所见即所得的编辑器，请签出CKEDITOR 。

Javascript或正则表达式解决方案使标记XHTML兼容

问题描述

3 个解决方案

解决方案1
2 已采纳 2010-09-01 17:39:30

解决方案2
1 2010-09-01 17:17:39

解决方案3
0 2010-09-01 17:00:05

Javascript或正则表达式解决方案使标记XHTML兼容

问题描述

3 个解决方案

解决方案1 2 已采纳 2010-09-01 17:39:30

解决方案2 1 2010-09-01 17:17:39

解决方案3 0 2010-09-01 17:00:05

解决方案1
2 已采纳 2010-09-01 17:39:30

解决方案2
1 2010-09-01 17:17:39

解决方案3
0 2010-09-01 17:00:05