如何用Jsoup選擇只有空格的元素？

Question

我在選擇只有空格的元素時遇到了問題。

鑒於html： <html><body><p> </p></body></html>

使用：empty不會選擇我假設的p，因為其中有一個“”的文本節點

但是， :matchesOwn(^\\\\s+$)也不會選擇它，因為似乎JSoup在對照正則表達式模式測試之前對文本執行了trim() 。

:matchesOwn(^$)將選擇它，但也選擇沒有非空文本節點的元素

也許我錯過了什么？

:matchesOwn根本不應該修剪，因為它使用正則表達式，應該評估整個文本

Answer 1

CSS選擇器只能匹配特定類型的節點： element 。 選擇器無法找到注釋或文本節點。 為了找到只有空格的元素，我們必須依賴Jsoup API。

我們將查找僅具有一個唯一文本節點子節點的節點。 此唯一文本節點必須與以下正則表達式匹配^\\s+$ 。 為了得到（未修剪的）文本，我們將調用TextNode#getWholeText方法。

這是怎么做的：

String html = "<html><body><div><p> </p><p> </p><span>\n\t\n   </span></div><span></span></body></html>";

Document doc = Jsoup.parse(html);

final Matcher onlyWhitespaceMatcher = Pattern.compile("^\\s+$").matcher("");
new NodeTraversor(new NodeVisitor() {

    @Override
    public void head(Node node, int depth) {
        List<Node> childNodes = node.childNodes();
        // * We're looking for nodes with one child only otherwise we move on
        if (childNodes.size() != 1) {
            return;
        }

        // * This unique child node must be a TextNode
        Node uniqueChildNode = childNodes.get(0);
        if (uniqueChildNode instanceof TextNode == false) {
            return;
        }

        // * This unique TextNode must be whitespace only
        if (onlyWhitespaceMatcher.reset(((TextNode) uniqueChildNode).getWholeText()).matches()) {
            System.out.println(node.nodeName());
        }
    }

    @Override
    public void tail(Node node, int depth) {
        // void
    }
}).traverse(doc);
// Instead of traversing the whole document,
// we could narrow down the search to its body only with doc.body()

OUTPUT

p
p
span

如何用Jsoup選擇只有空格的元素？

問題描述

1 個解決方案

解決方案1
0 2016-01-26 11:28:11

OUTPUT

如何用Jsoup選擇只有空格的元素？

問題描述

1 個解決方案

解決方案1 0 2016-01-26 11:28:11

OUTPUT

解決方案1
0 2016-01-26 11:28:11