Stripping out HTML tags from Java string with zero deps

Question

I understand that this question is very similar to this one and others. I have the same question ( how to strip out HTML tags from a Java string? ) with the added constraint that I don't want to add any dependencies (Apache Commons, Spring, etc.) to my code.

So I'm looking for a "pure Java SE" flavor of the HTML tag-stripping algorithms used by a lot of these other frameworks, but not sure exactly where to start. Thanks in advance.

Answer 1

Without using the HTMLEditorKit explicitly:

    String html = "<html>...";
    JTextPane pane = new JTextPane();
    pane.setContentType("text/html");
    pane.setText(html);
    StyledDocument doc = pane.getStyledDocument();
    try {
        System.out.println("Text: " + doc.getText(0, doc.getLength()));
    } catch (BadLocationException ex) {
        Logger.getLogger(NewJFrame.class.getName()).log(Level.SEVERE, null, ex);
    }

Stripping out HTML tags from Java string with zero deps

Question

1 answers

solution1
0 2013-03-19 10:35:15

Stripping out HTML tags from Java string with zero deps

Question

1 answers

solution1 0 2013-03-19 10:35:15

solution1
0 2013-03-19 10:35:15