I am pulling some page with simple_html_dom and on a page there is a list of ul li
elements which I need to pull, but problem is that these are basically video tags, which are combined with other elements that I don't need in that.
Here is an example of original page source:
<ul id="video-tags">
<li>Uploader: </li>
<li class="profile_name"><a href="/profiles/sarasubmit">Sarasubmit</a>.</li>
<li><em>Tagged: </em></li>
<li><a href="/tags/makeup">makeup</a>, </li>
<li><a href="/tags/cosmetic">cosmetic</a>, </li>
<li><a href="/tags/liner">liner</a>, </li>
<li><a href="/tags/fresh">fresh</a>, </li>
<li><a href="/tags/girls">girls</a>, </li>
<li><a href="/tags/fashion">fashion</a>, </li>
<li>more <a href="/tags/"><strong>tags</strong></a>.</li>
</ul>
So when I pull the page I tried using this to get the tags.
$get_tags = $video_page_url->find('ul[id="video-tags"]', 0);
$post_tags_arr = array();
foreach($get_tags->find('a') as $tag) {
$post_tags_arr[] = $tag->plaintext;
}
$post_tags = implode(', ', $post_tags_arr);
This way I get all the a elements inside li
and output text, but since profile name is also link and more tags is also link I get that 2 also so I end up with this.
sarasubmit, makeup, cosmetic, liner, fresh, girls, fashion, tags
Is there a way that I can just strip out tags and remove other elements so I end up like this:
makeup, cosmetic, liner, fresh, girls, fashion,
Edit: Just to mention, username is not constant so it's changing depending of who uploaded video, and also some videos don't have tags at all, and some have more or less tags. So things are dynamic.
You may try something like this:
foreach($get_tags->find('li[!class] a') as $tag) {
if($tag->plaintext != 'tags') $post_tags_arr[] = $tag->plaintext;
}
Instead of this:
foreach($get_tags->find('a') as $tag)
$post_tags_arr[] = $tag->plaintext;
}
Update: I've tested:
$htmlStr = '<ul id="video-tags">
<li>Uploader: </li>
<li class="profile_name"><a href="/profiles/sarasubmit">Sarasubmit</a>.</li>
<li><em>Tagged: </em></li>
<li><a href="/tags/makeup">makeup</a>, </li>
<li><a href="/tags/cosmetic">cosmetic</a>, </li>
<li><a href="/tags/liner">liner</a>, </li>
<li><a href="/tags/fresh">fresh</a>, </li>
<li><a href="/tags/girls">girls</a>, </li>
<li><a href="/tags/fashion">fashion</a>, </li>
<li>more <a href="/tags/"><strong>tags</strong></a>.</li>
</ul>';
$html = str_get_html($htmlStr);
foreach($html->find('li[!class] a') as $tag) {
if($tag->plaintext != 'tags') $post_tags_arr[] = $tag->plaintext;
}
print_r($post_tags_arr);
Output:
Array
(
[0] => makeup
[1] => cosmetic
[2] => liner
[3] => fresh
[4] => girls
[5] => fashion
)
So, try this:
$html = file_get_html($url);
foreach($html->find('li[!class] a') as $tag) {
if($tag->plaintext != 'tags') $post_tags_arr[] = $tag->plaintext;
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.