从HTML标签提取数据

Question

I have the following code and trying to extract the value of attribute content from an html page, But it's not giving any result that I expect, instead its give only blank page. 我有以下代码，并尝试从html页面中提取属性内容的值，但是它没有给出我期望的任何结果，而是仅给出了空白页面。

Any help where could be the issue ? 任何帮助可能是问题所在？

 $url= "https://fr-ca.wordpress.org"; $html = file_get_contents($url); # Create a DOM parser object $dom = new DOMDocument(); $dom->loadHTML($html); foreach ($dom->getElementsByTagName('meta') as $key ) { echo "<pre>"; $tab[] = $key->getAttribute('content'); } $reg= '<meta name="generator" content="(.*?)"/>'; if (preg_match_all($reg, $html, $ar)) { print_r($ar); }

Page source has : 页面源具有：

<meta name="generator" content="WP 4.5"/>

Answer 1

try this: 尝试这个：

$html = '<meta name="generator" content="WP 4.5"/>';
preg_match_all('/content="(.*)"/i', $html, $matches);
if (isset($matches[1])) {
    print_r($matches[1]);
}

Answer 2

Here is a regex that will look for a meta tag and get the content attribute contents. 这是一个正则表达式，它将查找元标记并获取content属性的内容。 It has some wild cards that will account for other variables such as different names, or extra spaces, etc. 它具有一些通配符，这些通配符将说明其他变量，例如不同的名称或多余的空格等。

$html = '<meta name="generator" content="WP 4.5"/>';

preg_match_all( '#<meta.*?content=[\'"](.*?)[\'"]\s*/>#i', $tab, $results );
print_r( $results[1] ); // contains array of captures.
if( $results[1] ) {
    // code here...
}

Answer 3

please use like this ... 请这样使用...

$html = file_get_contents( $url);

    libxml_use_internal_errors( true);
    $doc = new DOMDocument;
    $doc->loadHTML( $html);
    $xpath = new DOMXpath( $doc);

    // A name attribute on a <div>???
    $nodes = $xpath->query( '//div[@name="changeable_text"]')->item( 0);

    echo $nodes->Content;

OR 要么

// Use Curl ... //使用Curl ...

function getHTML($url,$timeout)
{
       $ch = curl_init($url); // initialize curl with given url
       curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]); // set  useragent
       curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable
       curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
       curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // max. seconds to execute
       curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error
       return @curl_exec($ch);
}
$html=getHTML("http://www.website.com",10);
// Find all images on webpage
foreach($html->find("img") as $element)
echo $element->src . '<br>';

// Find all links on webpage
foreach($html->find("a") as $element)
echo $element->href . '<br>';

从HTML标签提取数据

问题描述

3 个解决方案

解决方案1
1 2017-12-26 06:38:20

解决方案2
1 2017-12-26 06:57:47

解决方案3
0 2017-12-26 07:19:44

从HTML标签提取数据

问题描述

3 个解决方案

解决方案1 1 2017-12-26 06:38:20

解决方案2 1 2017-12-26 06:57:47

解决方案3 0 2017-12-26 07:19:44

解决方案1
1 2017-12-26 06:38:20

解决方案2
1 2017-12-26 06:57:47

解决方案3
0 2017-12-26 07:19:44