简体   繁体   中英

Replace Relative linked images with absolute path images in a string using JavaScript

I have a Google Chrome extension I am building for adding new bookmarks to my bookmarks app.

One of the features of my bookmark app is allowing to save a screenshot image of the web page and up to 3 additional images.

IN the Chrome extension, the 3 additional images show as a text input to insert an image URL.

Under each input I have scraped the web page HTML to find all images in the page and I show them in a slider with previous and next arrow buttons to rotate and view all the images on the page. If the user likes one of the images on the page, they can select it in this slider which then converts the image to Base64 encoded string and uploads to my remote bookmark app server.

My problem is that in the image selector where I show the images from the web page, it shows a broken image for any image that was in the page and was linked with a relative path instead of a full path with a domain name in it.

(last image shown in the 4 images in this animated GIF below shows the 4th is a broken image)
在此处输入图片说明

If I view the page source and see a relative linked image like this...

在此处输入图片说明

Then this image will show as a broken image in my image selector/slider in my extension as it will then link to the image like this where the relative linked image ends up getting the extension URL in front of it...

在此处输入图片说明

Below is my JavaScript function which scrapes the HTML and grabs the images found in the page.

I need to detect when the image URL is a relative linked image and then inject the page URL in front of the image URL to make it a absolute path linked image.

Any ideas how to achieve this?

Relative image urls currently end up linking to the image with this as the "domain"... chrome-extension://pcfibleldhbmpjaaebaplofnlodfldfj .

I need to instead inject the URL of the web page in front of all relative linked images.

In my JS function below where it saves the Image URL to an array,

var img.src looks like this on relative URL's...

在此处输入图片说明

So If I could simply replace chrome-extension://pcfibleldhbmpjaaebaplofnlodfldfj with the webpage URL that would fix my problem.

The chrome extension URL is different though so would need to match that pattern.

JavaScript function to get all images in an HTML string:

/**
 * Scrape webpage and get all images found in HTML
 * @param  string $htmlSource - HTML string of the webpage HTML
 * @return array - array of HTML strings with list items and images inside each list item
 */
scrapeWebpageForImages: function($htmlSource) {
    // HTML source code of the webpage passed into jQuery so we can work on it as an object
    var $html = $($htmlSource);

    // All images
    var images = $('img', $html),
      scanned = 0,
      filtered = [],
      ogtmp = '',
      srcs = {};

    // Grab the open graph image
    var ogimage = $('meta[property="og:image"]', $html);
    if( ogimage.length > 0 ) {
      ogtmp = $('<img>').prop({
        'src': $(ogimage).text(),
        'class': 'opengraph',
        'width': 1000, // High priority
        'height': 1000
      });
      images.push(ogtmp);
    }

    var i = 0,
      l = images.length,
      result = '',
      img;

    // Cycle through all images
    for(; i < l; i++) {
      scanned += 1;
      img = images[i];

      // Have we seen this image already?
      if( !! srcs[$(img, $html).attr('src')] ) {
        // Yep, skip it
        continue;
      } else {

        //////////////////////////////////////
        ///
        ///  NEED TO DETECT A RELATIVE LINKED IMAGE AND REPLACE WITH ABSOLUTE LINKED IMAGE URL
        ///  USING THE WEBPAGE URL
        ///  
        //////////////////////////////////////


        // Nope, remember it
        srcs[$(img, $html).attr('src')] = true;
        result = '<li><img src="'+img.src+'" title="'+img.alt+'"></li>';
        filtered.push(result);
      }
    } // end for loop

  return filtered;
},
var url = "chrome-extension://pcfibleldhbmpjaaebaplofnlodfldfj/assets/xyz";
var myRe = /chrome-extension:\/\/[\w]*/g;
var match = myRe.exec(url);

if(match.length > 0) {
  // Pattern matched
  var path = url.substring(match[0].length);
  url = 'whatever your base url is' + path;
} else {
  console.log('Did not find a url.');
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM