简体   繁体   English

使用 PHP Simple HTML DOM Parser 获取文本

[英]Get text with PHP Simple HTML DOM Parser

i'm using PHP Simple HTML DOM Parser to get text from a webpage.我正在使用 PHP Simple HTML DOM Parser 从网页中获取文本。 The page i need to manipulate is something like:我需要操作的页面是这样的:

<html>
<head>
<title>title</title>
<body>
<div id="content">
<h1>HELLO</h1>
Hello, world!
</div>
</body>
</html>

I need to get the h1 element and the text that has no tags.我需要获取h1元素和没有标签的文本。 to get the h1 i use this code:要获得h1我使用以下代码:

$html = file_get_html("remote_page.html");
foreach($html->find('#content') as $text){
echo "H1: ".$text->find('h1', 0)->plaintext;
}

But the other text?但是其他文字呢? I also tried this into the foreach but i get the full text:我也在 foreach 中尝试过这个,但我得到了全文:

$text->plaintext;

but it returned also the H1 tag...但它也返回了H1标签......

You can simply strip html tags using strip_tags您可以使用strip_tags简单地strip_tags html 标签

<?php
strip_tags($input, '<br>');
?>

Use strip tags, as @Peachy pointed out.正如@Peachy 指出的那样,使用条带标签。 However, passing it a second argument <br> means string will ignore <br> tags, which is unnecessary.但是,向它传递第二个参数<br>意味着 string 将忽略<br>标签,这是不必要的。 In your case,在你的情况下,

<?php
    strip_tags($text);
?>

would work as you'd like, given that you are only selecting content in the content id.可以按您的意愿工作,因为您只选择content ID 中的content

尝试一下

echo "H1: ".$text->find('h1', 0)->innertext;

It looks like $text->find('text',2);它看起来像$text->find('text',2); gets what you're looking for, however I'm not sure how well that will work when the amount of text nodes is unknown.得到你正在寻找的东西,但是我不确定当文本节点的数量未知时它会如何工作。 I'll keep looking.我会继续寻找。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM