简体   繁体   English

零深度从Java字符串中剥离HTML标签

[英]Stripping out HTML tags from Java string with zero deps

I understand that this question is very similar to this one and others. 据我所知,这个问题相似, 这一个和其他人。 I have the same question ( how to strip out HTML tags from a Java string? ) with the added constraint that I don't want to add any dependencies (Apache Commons, Spring, etc.) to my code. 我有一个相同的问题( 如何从Java字符串中删除HTML标记? ), 但又增加了约束,我不想在代码中添加任何依赖项(Apache Commons,Spring等)。

So I'm looking for a "pure Java SE" flavor of the HTML tag-stripping algorithms used by a lot of these other frameworks, but not sure exactly where to start. 因此,我正在寻找许多其他框架使用的HTML标记剥离算法的“纯Java SE”风格,但不确定从何开始。 Thanks in advance. 提前致谢。

Without using the HTMLEditorKit explicitly: 无需显式使用HTMLEditorKit:

    String html = "<html>...";
    JTextPane pane = new JTextPane();
    pane.setContentType("text/html");
    pane.setText(html);
    StyledDocument doc = pane.getStyledDocument();
    try {
        System.out.println("Text: " + doc.getText(0, doc.getLength()));
    } catch (BadLocationException ex) {
        Logger.getLogger(NewJFrame.class.getName()).log(Level.SEVERE, null, ex);
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM