如何使用PHP從HTML源代碼中提取特定字符串

Question

我正在嘗試從整個HTML源代碼中提取特定的字符串。

HTML來源：查看來源： https://www.instagram.com/p/BUbZXXMjnxY/?taken-by=narentrigger&hl=en ： https://www.instagram.com/p/BUbZXXMjnxY/?taken-by=narentrigger&hl=en -by https://www.instagram.com/p/BUbZXXMjnxY/?taken-by=narentrigger&hl=en - https://www.instagram.com/p/BUbZXXMjnxY/?taken-by=narentrigger&hl=en

需要提取字符串： https://instagram.fmaa1-2.fna.fbcdn.net/t51.2885-15/e35/18645014_163619900839441_7821159798480568320_n.jpg : https://instagram.fmaa1-2.fna.fbcdn.net/t51.2885-15/e35/18645014_163619900839441_7821159798480568320_n.jpg來自“ og：image”元屬性。

我嘗試了一些方法，但是一切都出錯了。 有什么方法可以從源代碼的og：image meta屬性中獲取圖像鏈接。 提取后需要將圖像URL存儲在特定變量上。 需要專家幫助。 需要提取的網址

Answer 1

如果僅獲取一個子字符串，請不要使用preg_match_all() 。 加載DOMDocument似乎對於此任務來說是過大的。

通過使用\\K ，可以減少結果數組的膨脹。

輸入樣例：

$input='<meta property="og:title" content="Instagram post by Narendiran blah blah" />
<meta property="og:image" content="https://instagram.fmma1-2.blah.jpg" />
<meta property="og:description" content="8 Likes, 1 Comments - blah" />';

方法（演示）：

$url=preg_match('/"og:image"[^"]+"\K[^"]+/',$input,$out)?$out[0]:null;
echo $url;

輸出：

https://instagram.fmma1-2.blah.jpg

通過使用否定的字符類，正則表達式引擎將更有效地運行。 [^"] 。（模式演示）

Answer 2

假設您在PHP的字符串中包含標記，那么RegEx有什么問題？

preg_match_all('/<meta.*property="og:image".*content="(.*)".*\/>/', $string, $matches);
echo $matches[1][0];

演示版

免責聲明：可能會提供更有效的正則表達式 。

Answer 3

在此代碼段中，我使用DOMDocument從meta標記中抓取屬性內容。 它將存儲在數組中以防萬一並返回。 希望它能工作。

   function get_img_url($url) { 

        // Create a new DOM object 
        $html = new DOMDocument(); 

        // load the HTML page 
        $html->loadHTMLFile($url); 

        // create a empty array object 
        $imageArray = array(); 

        //Loop through each meta tag
        foreach($html->getElementsByTagName('meta') as $meta) { 
            $imageArray[] = array('url' => $meta->getAttribute('content')); 
        } 

        //Return the list 
        return $imageArray; 
    }

Answer 4

嘗試使用此代碼來抓取網頁。 我使用了simple_html_dom_parser 。 您可以從https://sourceforge.net/projects/simplehtmldom/files/下載

include_once("simple_html_dom.php");

$output_filename = "example_homepage.html";
$fp = fopen($output_filename, 'w');
$url = 'https://www.instagram.com/p/BUbZXXMjnxY/?taken-by=narentrigger&hl=en';
$curl = curl_init();

curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, false);
curl_setopt ($curl, CURLOPT_FILE, $fp);
$result = curl_exec($curl);

curl_close($curl);
fclose($fp);

$html = file_get_html('example_homepage.html');

foreach($html->find('meta[property=og:image]') as $element) 
   echo $element->content . '<br>';

如何使用PHP從HTML源代碼中提取特定字符串

問題描述

4 個解決方案

解決方案1
1 已采納 2017-05-24 06:06:01

解決方案2
0 2017-05-23 20:33:18

解決方案3
0 2017-05-23 21:29:18

解決方案4
0 2017-05-24 09:23:50

如何使用PHP從HTML源代碼中提取特定字符串

問題描述

4 個解決方案

解決方案1 1 已采納 2017-05-24 06:06:01

解決方案2 0 2017-05-23 20:33:18

解決方案3 0 2017-05-23 21:29:18

解決方案4 0 2017-05-24 09:23:50

解決方案1
1 已采納 2017-05-24 06:06:01

解決方案2
0 2017-05-23 20:33:18

解決方案3
0 2017-05-23 21:29:18

解決方案4
0 2017-05-24 09:23:50