简体   繁体   中英

How to Extract Particular String from the HTML Source code using PHP

I'm trying to extract particular string from the whole HTML source code.

HTML Source: view-source: https://www.instagram.com/p/BUbZXXMjnxY/?taken-by=narentrigger&hl=en

Need To Extract String: https://instagram.fmaa1-2.fna.fbcdn.net/t51.2885-15/e35/18645014_163619900839441_7821159798480568320_n.jpg From the "og:image" Meta Property.

i have tried some methods, but everything gone wrong. Is there any way to grab the image link from the og:image meta property of the source code. After extracting need to store the image url on a particular variable. Expert helps needed. Url that need to extract

Don't use preg_match_all() if you are only grabbing one substring. Loading a DOMDocument seems like overkill for this task.

By using \\K you can reduce result array bloat.

Sample Input:

$input='<meta property="og:title" content="Instagram post by Narendiran blah blah" />
<meta property="og:image" content="https://instagram.fmma1-2.blah.jpg" />
<meta property="og:description" content="8 Likes, 1 Comments - blah" />';

Method ( Demo ):

$url=preg_match('/"og:image"[^"]+"\K[^"]+/',$input,$out)?$out[0]:null;
echo $url;

Output:

https://instagram.fmma1-2.blah.jpg

The regex engine will run more efficiently by using a negated character class. [^"] . ( Pattern Demo )

Assuming you have the markup inside a string with PHP, what's wrong with a RegEx ?

preg_match_all('/<meta.*property="og:image".*content="(.*)".*\/>/', $string, $matches);
echo $matches[1][0];

Demo

Disclaimer: more efficient regexes may be available .

In this code snippet I'm using DOMDocument to scrap the attribute content form the meta tag. It stores it in an Array in case there are more and returns it. Hope it works.

   function get_img_url($url) { 

        // Create a new DOM object 
        $html = new DOMDocument(); 

        // load the HTML page 
        $html->loadHTMLFile($url); 

        // create a empty array object 
        $imageArray = array(); 

        //Loop through each meta tag
        foreach($html->getElementsByTagName('meta') as $meta) { 
            $imageArray[] = array('url' => $meta->getAttribute('content')); 
        } 

        //Return the list 
        return $imageArray; 
    } 

Try this code to scrap webpage. I used simple_html_dom_parser . you can download it from https://sourceforge.net/projects/simplehtmldom/files/

include_once("simple_html_dom.php");

$output_filename = "example_homepage.html";
$fp = fopen($output_filename, 'w');
$url = 'https://www.instagram.com/p/BUbZXXMjnxY/?taken-by=narentrigger&hl=en';
$curl = curl_init();

curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, false);
curl_setopt ($curl, CURLOPT_FILE, $fp);
$result = curl_exec($curl);

curl_close($curl);
fclose($fp);

$html = file_get_html('example_homepage.html');

foreach($html->find('meta[property=og:image]') as $element) 
   echo $element->content . '<br>';

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM