[英]Parse HTML “style” attribute using Java
I have HTML code parsed to org.w3c.dom.Document
. 我将HTML代码解析为
org.w3c.dom.Document
。 I need check all tag style
attributes, parse them, change some CSS properties and put modified style definition back to attribute. 我需要检查所有标记
style
属性,解析它们,更改一些CSS属性并将修改后的样式定义返回到属性。
Is there any standard ways to parse style
attribute? 是否有任何标准方法来解析
style
属性? How can I use classes and interfaces from org.w3c.dom.css
package? 如何使用
org.w3c.dom.css
包中的类和接口?
I need a Java solution. 我需要一个Java解决方案。
If you want a way to do this without any dependencies you can use the javax.swing.text.html
package classes to get you most of the way there: 如果你想要一种没有任何依赖关系的
javax.swing.text.html
你可以使用javax.swing.text.html
包类来获得大部分方法:
import javax.swing.text.html.*;
StyleSheet styleSheet = new StyleSheet()
AttributeSet dec = ss.getDeclaration("margin:2px;padding:3px");
Object marginLeft = dec.getAttribute(CSS.Attribute.MARGIN_LEFT);
String marginLeftString = marginLeft.toString(); // "2px"
This returns a StyleSheet.CssValue
, which is unfortunately not public. 这将返回一个
StyleSheet.CssValue
,遗憾的是它不公开。 Thus the need to convert it to a String. 因此需要将其转换为String。 Also, it won't handle
em
units. 此外,它不会处理
em
单位。 It is sort of smart about various styles, though. 不过,它对各种风格都很聪明。 Not ideal, but avoids dependencies.
不理想,但避免依赖。
First, I would check out the classes in the javax.xml
packages. 首先,我将检查
javax.xml
包中的类。 The javax.xml.parsers
package contains parsers for two styles of parsing: SAXParser and DocumentBuilder. javax.xml.parsers
包中包含两种解析样式的解析器:SAXParser和DocumentBuilder。 It sounds like you want the DocumentBuilder to create a DOM. 听起来你想让DocumentBuilder创建一个DOM。 You can either traverse the DOM manually (slow and painful), or you can use the XPath standard to look up elements in the DOM.
您可以手动遍历DOM(缓慢而痛苦),也可以使用XPath标准查找DOM中的元素。 Java support for that is in
javax.xml.xpath
. Java支持是在
javax.xml.xpath
。
XPathExpression xpath = XPath.compile("//@style");
Object results = xpath.evaluate(dom, XPathConstants.NODESET);
It's your responsibility to cast the results to the NodeList and iterate properly, but its the most direct way to get at what you want. 您有责任将结果转换为NodeList并正确迭代,但这是获得所需内容的最直接方式。 Check out Java's DOM API for more information about reading and changing values.
有关读取和更改值的更多信息,请查看Java的DOM API。
I don't believe there is any support for a CSS parser built into Java, but you can look at these projects: 我不相信对Java内置的CSS解析器有任何支持,但您可以查看这些项目:
That may help you with your goals. 这可能会帮助您实现目标。 NOTE: the Batik CSS parser is incorporated into the larger Apache Batik project: http://xmlgraphics.apache.org/batik/index.html which may have more than what you need, but it's a corporate friendly license.
注意:Batik CSS解析器被合并到更大的Apache Batik项目中: http : //xmlgraphics.apache.org/batik/index.html ,它可能比你需要的更多,但它是一个公司友好的许可证。
I'm not sure I completely understand your requirements, but basically, you'll have to: 我不确定我是否完全理解你的要求,但基本上,你必须:
It looks like you would use the CSSStyleSheet interface to extract the CSS rules from the sytlesheet(s). 看起来你会使用CSSStyleSheet接口从sytlesheet中提取CSS规则。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.