简体   繁体   English

从php中的<div>标签的内容创建数组

[英]Create array from the contents of <div> tags in php

I have the contents of a web page assigned to a variable $html 我有一个分配给变量$html的网页的内容

Here's an example of the contents of $html : 这是$html内容的一个例子:

<div class="content">something here</div>
<span>something random thrown in <strong>here</strong></span>
<div class="content">more stuff</div>

How, using PHP can I create an array from that that finds the contents of <div class="content"></div> regions like this (for the example above) so: 如何,使用PHP,我可以创建一个数组,从中找到像这样的<div class="content"></div>区域的<div class="content"></div> (对于上面的例子),所以:

echo $array[0] . "\n" . $array[1]; //etc

outputs 输出

something here
more stuff

Assuming this is just a simplified case in the OP and the real situation is more complicated, you'll want to use XPath. 假设这只是OP中的一个简化案例,而且实际情况更复杂,那么您将需要使用XPath。

If it's really complex, then you may want to use DOMDocument (with DOMXPath ), but here's a simple example using SimpleXML 如果它真的很复杂,那么你可能想要使用DOMDocument (使用DOMXPath ),但这是一个使用SimpleXML的简单示例

$xml = new SimpleXMLElement($html);

$result = $xml->xpath('//div[@class="content"]');

while(list( , $node) = each($result)) {
    echo $node,"\n";
}

Since you explicitly asked about creating an array for this, you could use: 由于您明确询问了为此创建数组,您可以使用:

$res_Arr = array();
while(list( , $node) = each($result)) {
    $res_Arr[] = $node;
}

and $res_Arr would be an array with the contents you're looking for. $res_Arr将是一个包含您正在寻找的内容的数组。

See http://php.net/manual/en/simplexmlelement.xpath.php for php SimpleXML Xpath info and http://www.w3.org/TR/xpath for the XPath specifications 有关php SimpleXML Xpath信息,请参见http://php.net/manual/en/simplexmlelement.xpath.php ;有关XPath规范,请参阅http://www.w3.org/TR/xpath

PHP has several means of processing HTML, including DomDocument and SimpleXML . PHP有几种处理HTML的方法,包括DomDocumentSimpleXML See Parse HTML With PHP And DOM . 请参阅使用PHP和DOM解析HTML Here is an example: 这是一个例子:

$dom = new DomDocument; 
$dom->loadHTML($html); 
$dom->preserveWhiteSpace = false; 
$divs = $dom->getElementsByTagName('div'); 
foreach ($divs as $div) {
  $class = $div->getAttribute('class');
  if ($class == 'content') {
    echo $div->nodeValue . "\n";
  }
}

Technically the class attribute could be multiple classes so you might want to use: 从技术上讲,class属性可以是多个类,因此您可能希望使用:

$classes = explode(' ', $class);
if (in_array('content', $classes)) {
  ...
}

The SimpleXML/XPath approach is more concise but if you don't want to go the XPath route (and learning another technology, at least enough to do these sorts of tasks) then the above is a programmatic alternative. SimpleXML / XPath方法更简洁,但如果您不想使用XPath路径(并且学习另一种技术,至少足以完成这些类型的任务),那么上面是一种程序化的替代方案。

You probaly need to use preg_match_all () 你可能需要使用preg_match_all ()

$matches = array();
preg_match_all('`\<div(.*?)class\=\"content\"(.*?)\>(.*?)\<\/div\>`iUsm',$html,$matches,PREG_SET_ORDER);
foreach($matches as $m){
  // $m[3] represents the content in <div class="content">
}

There not much you can do short of using string manipulations function or regular expressions. 没有太多你可以做不到使用字符串操作函数或正则表达式。 you can load your HTML as XML using the DOM library and use that to traverse to your div, but that can become cumbersome if your not careful or if the structure is complex. 您可以使用DOM库将HTML作为XML加载并使用它遍历您的div,但如果您不小心或结构复杂,这可能会变得很麻烦。

http://ca3.php.net/manual/en/book.dom.php http://ca3.php.net/manual/en/book.dom.php

It looks like Kalem13 beat me to it, but I agree. 看起来Kalem13打败了我,但我同意。 You could use the DOMDocument class. 您可以使用DOMDocument类。 I haven't used it personally, but I think it would work for you. 我没有亲自使用它,但我认为这对你有用。 First you instantiate a DOMDocument object, then you load your $html variable using the loadHTML() function. 首先,实例化DOMDocument对象,然后使用loadHTML()函数加载$ html变量。 Then you can use the getElementsByTagName() function. 然后你可以使用getElementsByTagName()函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM