简体   繁体   English

通过使用simple_html_dom.php库来抓取HTML

[英]Scrape Html with by using simple_html_dom.php library

There is an html document with the following format: 有一个具有以下格式的html文档:

<div....>
  <map name="blah"
           .
           .
   />
  <map name="blah2"
          .
          .
   />
</div>

I want always to retrieve the second map. 我要始终检索第二张地图。 However, I want to make completely dynamic. 但是,我想使其完全动态。

$url = $_GET['url'];
$html_content = getHTML($url);
$html = str_get_html($html_content);

$map = $html->find('map[name=blah2]');

The aforementioned lines are working perfectly fine. 前面提到的行工作得很好。 But as I mentioned before I don't want to give manually the name. 但是正如我之前提到的,我不想手动给出名称。 I just want always to take the second map. 我只想总是拿第二张地图。 And moreover I want to retrieve also the name of the map. 而且,我还要检索地图的名称。

Any ideas? 有任何想法吗?

ps The code bellow doesn't work. ps代码下面的代码不起作用。 I've tried this before. 我以前尝试过 And doesn't display the content under the map. 并且不会在地图下显示内容。 However, correctly returns the name of the map 但是,正确返回地图的名称

   $map = $html->find('map',1);

What about: 关于什么:

$map = $html->find('map', 1);
echo $map->name;

It is very easy: 这很简单:

$map = $html->find('map',1);
if($map != null)
    $name = $map->name;

You just had to look . 你只需要看看

您可以使用jQuery的端口(例如PHPQuery http://code.google.com/p/phpquery/) ,该端口将为您提供eq()选择器,并且通常会启用相当丰富的XML操作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM