简体   繁体   中英

file_get_html() fetching plain text inside a div, but avoiding all other tags

i am using file_get_html() to get some external HTML, but i have a issue. I cannot seem to target text inside a div, while avoiding getting the rest of the content.

Lets say the layout is this:

<div class="post">
    <h1>Andromeda v1.4 – WordPress – The Beauty of Simplicity</h1>
    <div class="infos b20">
    <img class="post_img" src="/imagini/512b93babf84b.jpg" alt="Andromeda v1.4 – WordPress – The Beauty of Simplicity">
    <div style="width:610px; margin:10px 0; overflow:hidden; display:block;">
enter code here

    Andromeda is a clean theme with functional CMS and unique features. A massive pack of backend CMS options was created for this product to give you full control while creating and editing the site and its features. The main idea behind this theme was to create a something clean and simple, useful, nice looking and easy to modify.
    <p></p>
    <h6>Demo</h6>
    <code>http://themeforest.net/item/andromeda-wordpress-the-beauty-of-simplicity/107876</code>
    <h6>Download:</h6>
    <div class="link alert clearfix">
    <div class="link alert clearfix">
    <div class="link alert clearfix">
    <div class="link alert clearfix">
    <div class="link alert clearfix">
    <div class="link alert clearfix">
    <p></p>
    <ul id="social_post" class="clearfix sharingbtns">
    <div class="comments">
</div>

If i do a

$text = $dom->find('div[class=post]');
$text = $text[0]->plaintext;

I get all the content, I only want the text, inside the main div with the class post, and not all the other content.

What would be the best way to achive this?

Text and amount of other divs are variable, but the div class post, and the text will always be there, in the same position.

EDIT: To elaborate, i only want the text thats inside post, and has no tag

just to answer you quickly without checking out if it works:

http://simplehtmldom.sourceforge.net/manual_api.htm

Try this:

 $text = $dom->find('div[class=post]');
 $text = $text[0]->innertext;

or:

 $text = $dom->find('div[class=post]');
 $text = $text[0]->outertext;

By the way:

 <div style="width:610px; margin:10px 0; overflow:hidden; display:block;">

has no closing tag so there is no text that's inside the DIV you;re talking about. Please clarify.

 $res = $html->find('div[class=post]',0)->plaintext;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM