使用Java解析HTML“style”属性

Question

I have HTML code parsed to org.w3c.dom.Document . 我将HTML代码解析为org.w3c.dom.Document 。 I need check all tag style attributes, parse them, change some CSS properties and put modified style definition back to attribute. 我需要检查所有标记style属性，解析它们，更改一些CSS属性并将修改后的样式定义返回到属性。

Is there any standard ways to parse style attribute? 是否有任何标准方法来解析style属性？ How can I use classes and interfaces from org.w3c.dom.css package? 如何使用org.w3c.dom.css包中的类和接口？

I need a Java solution. 我需要一个Java解决方案。

Answer 1

If you want a way to do this without any dependencies you can use the javax.swing.text.html package classes to get you most of the way there: 如果你想要一种没有任何依赖关系的javax.swing.text.html你可以使用javax.swing.text.html包类来获得大部分方法：

import javax.swing.text.html.*;

StyleSheet styleSheet = new StyleSheet()
AttributeSet dec = ss.getDeclaration("margin:2px;padding:3px");
Object marginLeft = dec.getAttribute(CSS.Attribute.MARGIN_LEFT);
String marginLeftString = marginLeft.toString(); // "2px"

This returns a StyleSheet.CssValue , which is unfortunately not public. 这将返回一个StyleSheet.CssValue ，遗憾的是它不公开。 Thus the need to convert it to a String. 因此需要将其转换为String。 Also, it won't handle em units. 此外，它不会处理em单位。 It is sort of smart about various styles, though. 不过，它对各种风格都很聪明。 Not ideal, but avoids dependencies. 不理想，但避免依赖。

Answer 2

First, I would check out the classes in the javax.xml packages. 首先，我将检查javax.xml包中的类。 The javax.xml.parsers package contains parsers for two styles of parsing: SAXParser and DocumentBuilder. javax.xml.parsers包中包含两种解析样式的解析器：SAXParser和DocumentBuilder。 It sounds like you want the DocumentBuilder to create a DOM. 听起来你想让DocumentBuilder创建一个DOM。 You can either traverse the DOM manually (slow and painful), or you can use the XPath standard to look up elements in the DOM. 您可以手动遍历DOM（缓慢而痛苦），也可以使用XPath标准查找DOM中的元素。 Java support for that is in javax.xml.xpath . Java支持是在javax.xml.xpath 。

XPathExpression xpath = XPath.compile("//@style");
Object results = xpath.evaluate(dom, XPathConstants.NODESET);

It's your responsibility to cast the results to the NodeList and iterate properly, but its the most direct way to get at what you want. 您有责任将结果转换为NodeList并正确迭代，但这是获得所需内容的最直接方式。 Check out Java's DOM API for more information about reading and changing values. 有关读取和更改值的更多信息，请查看Java的DOM API。

I don't believe there is any support for a CSS parser built into Java, but you can look at these projects: 我不相信对Java内置的CSS解析器有任何支持，但您可以查看这些项目：

That may help you with your goals. 这可能会帮助您实现目标。 NOTE: the Batik CSS parser is incorporated into the larger Apache Batik project: http://xmlgraphics.apache.org/batik/index.html which may have more than what you need, but it's a corporate friendly license. 注意：Batik CSS解析器被合并到更大的Apache Batik项目中： http ： //xmlgraphics.apache.org/batik/index.html ，它可能比你需要的更多，但它是一个公司友好的许可证。

Answer 3

I'm not sure I completely understand your requirements, but basically, you'll have to: 我不确定我是否完全理解你的要求，但基本上，你必须：

Read the stylesheet(s) and extract the CSS rules. 阅读样式表并提取CSS规则。
Read the HTML page(s) and find the attributes. 阅读HTML页面并找到属性。
Substitute the new CSS properties for the old CSS properties. 替换旧CSS属性的新CSS属性。
Write the HTML page(s). 编写HTML页面。

It looks like you would use the CSSStyleSheet interface to extract the CSS rules from the sytlesheet(s). 看起来你会使用CSSStyleSheet接口从sytlesheet中提取CSS规则。

使用Java解析HTML“style”属性

问题描述

3 个解决方案

解决方案1
3 2015-09-25 20:57:01

解决方案2
1 2010-11-23 13:33:32

解决方案3
0 2010-11-23 13:34:22

使用Java解析HTML“style”属性

问题描述

3 个解决方案

解决方案1 3 2015-09-25 20:57:01

解决方案2 1 2010-11-23 13:33:32

解决方案3 0 2010-11-23 13:34:22

解决方案1
3 2015-09-25 20:57:01

解决方案2
1 2010-11-23 13:33:32

解决方案3
0 2010-11-23 13:34:22