![](/img/trans.png)
[英]How to get the img src of website having spaces in their class name with simple_html_dom.php?
[英]Get img src with PHP Simple HTML DOM
我需要從以下代碼中獲取圖像src
HTML
<div class="avatar profile_CF48B2B4A31B43EC96F0561F498CE6BF ">
<a onclick="">
<img id="lazyload_-247847544_0" height="74" width="74" class="avatar potentialFacebookAvatar avatarGUID:CF48B2B4A31B43EC96F0561F498CE6BF" src="http://media-cdn.tripadvisor.com/media/photo-l/05/f3/67/c3/lilrazzy.jpg" />
</a>
</div>
我試着寫js :
foreach($html->find('div[class=profile_CF48B2B4A31B43EC96F0561F498CE6BF] a img') as $element) {
$img = $element->getAttribute('src');
echo $img;
}
但它顯示src鍵不存在。 如何廢棄評論頭像圖片?
更新:
當我查看頁面源時找不到圖像網址,但是firebug顯示圖片網址:
<img id='lazyload_1953171323_17' height='24' alt='4 helpful votes' width='25' class='icon lazy'/>
這是我的頁面的源代碼:
<div class="col1of2">
<div class="member_info">
<div id="UID_3E0FAF58557D3375508A9E5D9A7BD42F-SRC_175428572" class="memberOverlayLink" onmouseover="ta.trackEventOnPage('Reviews','show_reviewer_info_window','user_name_photo'); ta.call('ta.overlays.Factory.memberOverlayWOffset', event, this, 's3 dg rgba_gry update2012', 0, (new Element(this)).getElement('.avatar')&&(new Element(this)).getElement('.avatar').getStyle('border-radius')=='100%'?-10:0);">
<div class="avatar profile_3E0FAF58557D3375508A9E5D9A7BD42F ">
<a onclick=>
<img id='lazyload_1953171323_15' height='74' width='74' class='avatar potentialFacebookAvatar avatarGUID:3E0FAF58557D3375508A9E5D9A7BD42F'/>
</a>
</div>
<div class="username mo">
<span class="expand_inline scrname hvrIE6 mbrName_3E0FAF58557D3375508A9E5D9A7BD42F" onclick="ta.trackEventOnPage('Reviews', 'show_reviewer_info_window', 'user_name_name_click')">Prataspeles</span>
</div>
</div>
<div class="location">
Latvia
</div>
</div>
<div class="memberBadging">
<div id="UID_3E0FAF58557D3375508A9E5D9A7BD42F-CONT" class="totalReviewBadge badge no_cpu" onclick="ta.trackEventOnPage('Reviews','show_reviewer_info_window','review_count'); ta.util.cookie.setPIDCookie('15984'); ta.call('ta.overlays.Factory.memberOverlayWOffset', event, this, 's3 dg rgba_gry update2012', -10, -50);">
<div class="reviewerTitle">Reviewer</div>
<img id='lazyload_1953171323_16' height='24' alt='4 reviews' width='25' class='icon lazy'/>
<span class="badgeText">4 reviews</span>
</div>
<div id="UID_3E0FAF58557D3375508A9E5D9A7BD42F-HV" class="helpfulVotesBadge badge no_cpu" onclick="ta.trackEventOnPage('Reviews','show_reviewer_info_window','helpful_count'); ta.util.cookie.setPIDCookie('15983'); ta.call('ta.overlays.Factory.memberOverlayWOffset', event, this, 's3 dg rgba_gry update2012', -22, -50);">
<img id='lazyload_1953171323_17' height='24' alt='4 helpful votes' width='25' class='icon lazy'/>
<span class="badgeText">4 helpful votes</span>
</div>
</div>
</div>
因為使用lazyload會有什么問題嗎?
更新2
使用lazyload會在頁面加載后加載我的圖像,我嘗試獲取圖像ID並將它們與lazyload js數組進行比較,但此id與lazyload var數組不匹配。
題:
如何從這個JSON獲取這個js數組?
例:
{"id":"lazyload_-205858383_0","tagType":"img","scroll":true,"priority":100,"data":"http://media-cdn.tripadvisor.com/media/photo-l/05/f3/67/c3/lilrazzy.jpg"}
, {"id":"lazyload_-205858383_1","tagType":"img","scroll":true,"priority":100,"data":"http://c1.tacdn.com/img2/icons/gray_flag.png"}
, {"id":"lazyload_-205858383_2","tagType":"img","scroll":true,"priority":100,"data":"http://media-cdn.tripadvisor.com/media/photo-l/01/2a/fd/98/avatar.jpg"}
, {"id":"lazyload_-205858383_3","tagType":"img","scroll":true,"priority":100,"data":"http://c1.tacdn.com/img2/icons/gray_flag.png"}
, {"id":"lazyload_-205858383_4","tagType":"img","scroll":true,"priority":100,"data":"http://media-cdn.tripadvisor.com/media/photo-l/01/2e/70/5e/avatar036.jpg"}
, {"id":"lazyload_-205858383_5","tagType":"img","scroll":false,"priority":100,"data":"http://c1.tacdn.com/img2/badges/badge_helpful.png"}
您遇到困難,因為一旦頁面加載,javascipt用於延遲加載圖像。 使用phpDom查找元素的Id,然后使用正則表達式根據此Id查找相關圖像。
要實現這一點,請嘗試以下方法:
$json = json_decode("<JSONSTRING HERE>");
foreach($html->find('div[class=profile_CF48B2B4A31B43EC96F0561F498CE6BF] a img') as $element) {
$imgId = $element->getAttribute('id');
foreach ($json as $lazy)
{
if ($lazy["id"] == $imgId) echo $lazy["data"];
}
}
以上是未經測試的,因此您需要解決這些問題。 它們的關鍵是提取相關的javascript並將其轉換為json。
或者,您可以使用字符串搜索功能獲取包含有關img信息的行,並提取所需的值。
如果您要查找包含子字符串“lazyload”的所有ID,您可以嘗試使用通配符選擇器,然后查看找到的元素的“src”屬性。 請參閱下面的jsfiddle。 祝好運!
$(document.body).find('img[id*=lazyload]').each(function() {
console.log($(this).prop('src'));
});
試試這個 -
foreach($html->find('div[class=profile_CF48B2B4A31B43EC96F0561F498CE6BF ] a img') as $element) {
$img = $element->getAttribute('src');
echo $img;
}
班級名稱后面有空格。 您必須在類名末尾添加空格。
要么
甚至使用全班名稱
$html->find('div[class=avatar profile_CF48B2B4A31B43EC96F0561F498CE6BF ] a img'
使用jQuery選擇器,即$('#lazyload_-247847544_0'),您可以使用此獲取圖像源
var src = $('#lazyload_-247847544_0').attr('src');
或者更具體地說
$('.profile_CF48B2B4A31B43EC96F0561F498CE6BF #lazyload_-247847544_0').attr('src');
謝謝
function getReviews(){
$url = 'http://www.tripadvisor.com/Hotel_Review-g274965-d952833-Reviews-Ezera_Maja-Liepaja_Kurzeme_Region.html';
$html = new simple_html_dom();
$html = file_get_html($url);
$array = array();
$i = 0;
// IMG ID
foreach($html->find('div[class=avatar] a img') as $element) { $array[$i]['id'] = $element->getAttribute('id'); $i++;} unset($i);$i = 0;
// IMG SRC
$p1 = strpos( $html, 'var lazyImgs =' ) + 14;
$p2 = strpos( $html, ']', $p1 );
$raw = substr( $html, $p1, $p2 - $p1 ) . ']';
$images = json_decode($raw);
foreach ($images as $image){
$id = $image->id;
$data = $image->data;
foreach ($array as $element){
if ( isset($element['id']) && $element['id'] == $id){
$array[$i]['image'] = $data;
$i++;
}
}
}
$html->clear();
unset($html);
return $array;
}
獲取陣列中的IMG ID。 然后在json中解壓var Lazyload並解碼。 然后比較2個數組,如果id mach將數據添加到數組。 謝謝大家!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.