简体   繁体   English

如何在HTML文档中获取所有TEXT外部元素

[英]How to get all TEXT outside elements in a HTML document

I'm using Symfony DomCrawler to get all text in a document. 我正在使用Symfony DomCrawler来获取文档中的所有文本。

$this->crawler->filter('p')->each(function (Crawler $node, $i) {
    // process text
});

I'm trying to gather all text within the <body> that are outside of elements. 我正在尝试收集<body>中除元素之外的所有文本。

<body>
    This is an example
    <p>
        blablabla
    </p>
    another example
    <p>
        <span>Yo!</span>
        again, another piece of text <br/>
        with an annoy BR in the middle
    </p>
</body>

I'm using PHP Symfony and can use XPath (preferred) or RegEx. 我正在使用PHP Symfony,可以使用XPath(首选)或RegEx。

The string value of the entire document can be obtained with this simple XPath: 使用这个简单的XPath可以获得整个文档的字符串值:

string(/)

All text nodes in the document would be: 文档中的所有文本节点都是:

//text()

The immediate text node children of body would be: body的直接文本节点子节点将是:

/body/text()

Note that the XPaths that select text nodes would typically be converted to concatenated string values, depending upon context. 请注意,选择文本节点的XPath通常会转换为连接字符串值,具体取决于上下文。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM