使用 PHP Simple HTML DOM Parser 获取文本

Question

i'm using PHP Simple HTML DOM Parser to get text from a webpage.我正在使用 PHP Simple HTML DOM Parser 从网页中获取文本。 The page i need to manipulate is something like:我需要操作的页面是这样的：

<html>
<head>
<title>title</title>
<body>
<div id="content">
<h1>HELLO</h1>
Hello, world!
</div>
</body>
</html>

I need to get the h1 element and the text that has no tags.我需要获取h1元素和没有标签的文本。 to get the h1 i use this code:要获得h1我使用以下代码：

$html = file_get_html("remote_page.html");
foreach($html->find('#content') as $text){
echo "H1: ".$text->find('h1', 0)->plaintext;
}

But the other text?但是其他文字呢？ I also tried this into the foreach but i get the full text:我也在 foreach 中尝试过这个，但我得到了全文：

$text->plaintext;

but it returned also the H1 tag...但它也返回了H1标签......

Answer 1

You can simply strip html tags using strip_tags您可以使用strip_tags简单地strip_tags html 标签

<?php
strip_tags($input, '<br>');
?>

Answer 2

Use strip tags, as @Peachy pointed out.正如@Peachy 指出的那样，使用条带标签。 However, passing it a second argument <br> means string will ignore <br> tags, which is unnecessary.但是，向它传递第二个参数<br>意味着 string 将忽略<br>标签，这是不必要的。 In your case,在你的情况下，

<?php
    strip_tags($text);
?>

would work as you'd like, given that you are only selecting content in the content id.可以按您的意愿工作，因为您只选择content ID 中的content 。

Answer 3

尝试一下

echo "H1: ".$text->find('h1', 0)->innertext;

Answer 4

It looks like $text->find('text',2);它看起来像$text->find('text',2); gets what you're looking for, however I'm not sure how well that will work when the amount of text nodes is unknown.得到你正在寻找的东西，但是我不确定当文本节点的数量未知时它会如何工作。 I'll keep looking.我会继续寻找。

使用 PHP Simple HTML DOM Parser 获取文本

问题描述

4 个解决方案

解决方案1
0 2016-12-14 03:41:41

解决方案2
0 2016-12-14 04:05:47

解决方案3
0 2021-06-24 10:14:48

解决方案4
0 2012-03-24 19:00:06

使用 PHP Simple HTML DOM Parser 获取文本

问题描述

4 个解决方案

解决方案1 0 2016-12-14 03:41:41

解决方案2 0 2016-12-14 04:05:47

解决方案3 0 2021-06-24 10:14:48

解决方案4 0 2012-03-24 19:00:06

解决方案1
0 2016-12-14 03:41:41

解决方案2
0 2016-12-14 04:05:47

解决方案3
0 2021-06-24 10:14:48

解决方案4
0 2012-03-24 19:00:06