[英]Stripping out HTML tags from Java string with zero deps
I understand that this question is very similar to this one and others. 据我所知,这个问题很相似, 这一个和其他人。 I have the same question ( how to strip out HTML tags from a Java string? ) with the added constraint that I don't want to add any dependencies (Apache Commons, Spring, etc.) to my code.
我有一个相同的问题( 如何从Java字符串中删除HTML标记? ), 但又增加了约束,我不想在代码中添加任何依赖项(Apache Commons,Spring等)。
So I'm looking for a "pure Java SE" flavor of the HTML tag-stripping algorithms used by a lot of these other frameworks, but not sure exactly where to start. 因此,我正在寻找许多其他框架使用的HTML标记剥离算法的“纯Java SE”风格,但不确定从何开始。 Thanks in advance.
提前致谢。
Without using the HTMLEditorKit explicitly: 无需显式使用HTMLEditorKit:
String html = "<html>...";
JTextPane pane = new JTextPane();
pane.setContentType("text/html");
pane.setText(html);
StyledDocument doc = pane.getStyledDocument();
try {
System.out.println("Text: " + doc.getText(0, doc.getLength()));
} catch (BadLocationException ex) {
Logger.getLogger(NewJFrame.class.getName()).log(Level.SEVERE, null, ex);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.