如何在HTML文档中获取所有TEXT外部元素

Question

I'm using Symfony DomCrawler to get all text in a document. 我正在使用Symfony DomCrawler来获取文档中的所有文本。

$this->crawler->filter('p')->each(function (Crawler $node, $i) {
    // process text
});

I'm trying to gather all text within the <body> that are outside of elements. 我正在尝试收集<body>中除元素之外的所有文本。

<body>
    This is an example
    <p>
        blablabla
    </p>
    another example
    <p>
        <span>Yo!</span>
        again, another piece of text <br/>
        with an annoy BR in the middle
    </p>
</body>

I'm using PHP Symfony and can use XPath (preferred) or RegEx. 我正在使用PHP Symfony，可以使用XPath（首选）或RegEx。

Answer 1

The string value of the entire document can be obtained with this simple XPath: 使用这个简单的XPath可以获得整个文档的字符串值：

string(/)

All text nodes in the document would be: 文档中的所有文本节点都是：

//text()

The immediate text node children of body would be: body的直接文本节点子节点将是：

/body/text()

Note that the XPaths that select text nodes would typically be converted to concatenated string values, depending upon context. 请注意，选择文本节点的XPath通常会转换为连接字符串值，具体取决于上下文。

如何在HTML文档中获取所有TEXT外部元素

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-06-01 13:49:12

如何在HTML文档中获取所有TEXT外部元素

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-06-01 13:49:12

解决方案1
0 已采纳 2016-06-01 13:49:12