简体   繁体   中英

Remove specific li element from list when parsing page with simple_html_dom

I am pulling some page with simple_html_dom and on a page there is a list of ul li elements which I need to pull, but problem is that these are basically video tags, which are combined with other elements that I don't need in that.

Here is an example of original page source:

<ul id="video-tags">
            <li>Uploader: </li>
    <li class="profile_name"><a href="/profiles/sarasubmit">Sarasubmit</a>.</li>
            <li><em>Tagged: </em></li>
                    <li><a href="/tags/makeup">makeup</a>, </li>
                            <li><a href="/tags/cosmetic">cosmetic</a>, </li>
                            <li><a href="/tags/liner">liner</a>, </li>
                            <li><a href="/tags/fresh">fresh</a>, </li>
                            <li><a href="/tags/girls">girls</a>, </li>
                            <li><a href="/tags/fashion">fashion</a>, </li>
                    <li>more <a href="/tags/"><strong>tags</strong></a>.</li>
  </ul>

So when I pull the page I tried using this to get the tags.

 $get_tags = $video_page_url->find('ul[id="video-tags"]', 0);

$post_tags_arr = array();
foreach($get_tags->find('a') as $tag) {
$post_tags_arr[] = $tag->plaintext;
}
$post_tags = implode(', ', $post_tags_arr);

This way I get all the a elements inside li and output text, but since profile name is also link and more tags is also link I get that 2 also so I end up with this.

sarasubmit, makeup, cosmetic, liner, fresh, girls, fashion, tags

Is there a way that I can just strip out tags and remove other elements so I end up like this:

 makeup, cosmetic, liner, fresh, girls, fashion,

Edit: Just to mention, username is not constant so it's changing depending of who uploaded video, and also some videos don't have tags at all, and some have more or less tags. So things are dynamic.

You may try something like this:

foreach($get_tags->find('li[!class] a') as $tag) {
    if($tag->plaintext != 'tags') $post_tags_arr[] = $tag->plaintext;
}

Instead of this:

foreach($get_tags->find('a') as $tag)
    $post_tags_arr[] = $tag->plaintext;
}

Update: I've tested:

$htmlStr = '<ul id="video-tags">
    <li>Uploader: </li>
    <li class="profile_name"><a href="/profiles/sarasubmit">Sarasubmit</a>.</li>
    <li><em>Tagged: </em></li>
    <li><a href="/tags/makeup">makeup</a>, </li>
    <li><a href="/tags/cosmetic">cosmetic</a>, </li>
    <li><a href="/tags/liner">liner</a>, </li>
    <li><a href="/tags/fresh">fresh</a>, </li>
    <li><a href="/tags/girls">girls</a>, </li>
    <li><a href="/tags/fashion">fashion</a>, </li>
    <li>more <a href="/tags/"><strong>tags</strong></a>.</li>
</ul>';

$html = str_get_html($htmlStr);
foreach($html->find('li[!class] a') as $tag) {
    if($tag->plaintext != 'tags') $post_tags_arr[] = $tag->plaintext;
}
print_r($post_tags_arr);

Output:

Array
(
    [0] => makeup
    [1] => cosmetic
    [2] => liner
    [3] => fresh
    [4] => girls
    [5] => fashion
)

So, try this:

$html = file_get_html($url);
foreach($html->find('li[!class] a') as $tag) {
    if($tag->plaintext != 'tags') $post_tags_arr[] = $tag->plaintext;
}

Check the manual .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM