[英]How to get all TEXT outside elements in a HTML document
I'm using Symfony DomCrawler to get all text in a document. 我正在使用Symfony DomCrawler来获取文档中的所有文本。
$this->crawler->filter('p')->each(function (Crawler $node, $i) {
// process text
});
I'm trying to gather all text within the <body>
that are outside of elements. 我正在尝试收集
<body>
中除元素之外的所有文本。
<body>
This is an example
<p>
blablabla
</p>
another example
<p>
<span>Yo!</span>
again, another piece of text <br/>
with an annoy BR in the middle
</p>
</body>
I'm using PHP Symfony and can use XPath (preferred) or RegEx. 我正在使用PHP Symfony,可以使用XPath(首选)或RegEx。
The string value of the entire document can be obtained with this simple XPath: 使用这个简单的XPath可以获得整个文档的字符串值:
string(/)
All text nodes in the document would be: 文档中的所有文本节点都是:
//text()
The immediate text node children of body
would be: body
的直接文本节点子节点将是:
/body/text()
Note that the XPaths that select text nodes would typically be converted to concatenated string values, depending upon context. 请注意,选择文本节点的XPath通常会转换为连接字符串值,具体取决于上下文。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.