如何使用php通过classname或id获取innerhtml

Question

嗨我正在从外部网址加载内容。 这样的事情。

$html=get_data($external_url);

其中get_data（）是使用curl获取内容的函数。

在此之后，我想通过使用他们的类或id从不同的html元素（如h1，div，p，span）获取内部html。

例如，如果来自外部URL（$ html）的内容是这样的。

<html>
<title></title>
<body>
    <h1 class="title">I am title</h1>
    <div id="content">
        i am the content.
    </div>
</body>

现在我想获得带有class =“title”的html标签的内部html。 同样我想获得id为“content”的标签的内部html

如何使用PHP做到这一点？ 我不了解DOM，XML。 请帮忙。

Answer 1

这很容易。 尝试

$dom_doc = new DomDocument();
$dom_doc->loadHTML($returned_external_html);
$element = $dom_doc->getElementsByTagName('table'); // you can search for any tags like <img>, <p> and etc. This will return a DOMNodeList
$element = $dom_doc->getElementById('specific_id'); // If you know the id of element you are seeking for try this. This will return a DOMElement
//If I want to getINNERHTML for the table element, the code should be:
$innerHTML= ''; 
$children = $element->childNodes; 
foreach ($children as $child) { 
    $innerHTML .= $child->ownerDocument->saveXML( $child ); 
}
echo $innerHTML; //contain the inner html of the element

查看这些链接以获取更多帮助
DOMDocument GetElementsByTagName
DOMDocument GetElementById

Answer 2

这是一个函数DOMDocument::saveHTML() 。 在当前的php版本中，这可以将您想要保存的节点保存为html。 要保存节点的内部html，必须保存每个子节点。

function getHtml($nodes) {
  $result = '';
  foreach ($nodes as $node) {
    $result .= $node->ownerDocument->saveHtml($node);
  }
  return $result;
}

要获取节点，可以使用Xpath。 身份很容易。

获取所有元素节点：

//*

具有id属性“content”的

//*[@id="content"]

仅使用第一个找到的节点，以防有人多次添加相同的ID。

//*[@id="content"][1]

获取子节点 - node（）包括元素，文本和其他几个节点

//*[@id="content"][1]/node()

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);

echo getHtml($xpath->evaluate('//*[@id="content"][1]/node()'));

class属性稍微复杂一些。 类属性是标记列表，它们可以包含多个类名。 这是匹配它们的技巧。 Xpath函数normalize-space（）将所有空白组转换为单个空格分隔符。 在前面和末尾添加一个空格，你会得到一个像" one two three "的字符串。 现在您可以检查" one "是否是该字符串的一部分。 在Xpath中：

规范化class属性：

normalize-space(@class)

添加空格以开始和结束：

concat(" ", normalize-space(@class), " ")

检查它是否包含子字符串

contains(concat(" ", normalize-space(@class), " "), " title ")

用它来限制节点

//*[contains(concat(" ", normalize-space(@class), " "), " title ")][1]/node()

放在一起：

$html = <<<'HTML'
<html>
<title></title>
<body>
    <h1 class="title">I am title</h1>
    <div id="content">
        i am the <b>content</b>.
    </div>
</body>
HTML;

$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);

function getHtml($nodes) {
  $result = '';
  foreach ($nodes as $node) {
    $result .= $node->ownerDocument->saveHtml($node);
  }
  return $result;
}

// first node with the id
var_dump(
  getHtml(
    $xpath->evaluate('//*[@id="content"][1]/node()')
  )
);

// first node with the class
var_dump(
  getHtml(
    $xpath->evaluate(
      '//*[contains(concat(" ", normalize-space(@class), " "), " title ")][1]/node()'
    )
  )
);

// alternative - handling multiple nodes with the same class in a loop
$nodes = $xpath->evaluate(
  '//*[contains(concat(" ", normalize-space(@class), " "), " title ")]'
);
foreach ($nodes as $node) {
  var_dump(getHtml($xpath->evaluate('node()', $node)));
}

输出： https ： //eval.in/118248

string(40) "
        i am the <b>content</b>.
    "
string(10) "I am title"
string(10) "I am title"

如何使用php通过classname或id获取innerhtml

问题描述

2 个解决方案

解决方案1
1 2014-03-11 16:31:38

解决方案2
1 已采纳 2014-03-11 17:36:01

如何使用php通过classname或id获取innerhtml

问题描述

2 个解决方案

解决方案1 1 2014-03-11 16:31:38

解决方案2 1 已采纳 2014-03-11 17:36:01

解决方案1
1 2014-03-11 16:31:38

解决方案2
1 已采纳 2014-03-11 17:36:01