简体   繁体   中英

How to read html content of a specific URL using Firefox addon?

I want to create an addon which will load html content of a specific url and save a specific line of that page and then move to that url. I read a lot of things on Mozila.org about content of a web page but I don't understand how to read the html content.

Here's a simple snippet that does XHR request, WITHOUT cookies. Don't worry about cross-origin as you are running from privelaged scope, meaning you aren't coding this in a website but as a firefox addon.

var {Cu: utils, Cc: classes, Ci: instances} = Components;
Cu.import('resource://gre/modules/Services.jsm');
function xhr(url, cb) {
    let xhr = Cc["@mozilla.org/xmlextras/xmlhttprequest;1"].createInstance(Ci.nsIXMLHttpRequest);

    let handler = ev => {
        evf(m => xhr.removeEventListener(m, handler, !1));
        switch (ev.type) {
            case 'load':
                if (xhr.status == 200) {
                    cb(xhr.response);
                    break;
                }
            default:
                Services.prompt.alert(null, 'XHR Error', 'Error Fetching Package: ' + xhr.statusText + ' [' + ev.type + ':' + xhr.status + ']');
                break;
        }
    };

    let evf = f => ['load', 'error', 'abort'].forEach(f);
    evf(m => xhr.addEventListener(m, handler, false));

    xhr.mozBackgroundRequest = true;
    xhr.open('GET', url, true);
    xhr.channel.loadFlags |= Ci.nsIRequest.LOAD_ANONYMOUS | Ci.nsIRequest.LOAD_BYPASS_CACHE | Ci.nsIRequest.INHIBIT_PERSISTENT_CACHING;
    //xhr.responseType = "arraybuffer"; //dont set it, so it returns string, you dont want arraybuffer. you only want this if your url is to a zip file or some file you want to download and make a nsIArrayBufferInputStream out of it or something
    xhr.send(null);
}

Example usage of this snippet:

var href = 'http://www.bing.com/'
xhr(href, data => {
    Services.prompt.alert(null, 'XHR Success', data);
});

Without knowing the page and URL to find on it I can't create a complete solution, but here's an example Greasemonkey script I wrote that does something similar.

This script is for Java articles on DZone. When an article has a link to the source, it redirects to this source page:

// ==UserScript==
// @name        DZone source
// @namespace   com.kwebble
// @description Directly go to the source of a DZone article.
// @include     http://java.dzone.com/*
// @version     1
// @grant       none
// ==/UserScript==

var node = document.querySelector('a[target="_blank"]');

if (node !== null) {
    document.location = node.getAttribute('href');
}

Usage:

  • Install Greasemonkey if you haven't yet.
  • Create the script, similar to mine. Set the value for @include to the page that contains the URL to find.
  • You must determine what identifies the part of the page with the destination URL and change the script to find that URL. For my script it's a link with a target of "_blank".

After saving the script visit the page with the link. Greasemonkey should execute your script and redirect the browser.

[edit] This searches script tags for text like you described and redirects.

// ==UserScript==
// @name        Test
// @namespace   com.kwebble
// @include     your_page
// @version     1
// @grant       none
// ==/UserScript==

var nodes = document.getElementsByTagName('script'),
    i, matches;

for (i = 0; i < nodes.length; i++) {
    if (nodes.item(i).innerHTML !== '') {
        matches = nodes.item(i).innerHTML.match(/windows\.location = "(.*?).php";/);

        if (matches !== null){
            document.location = matches[1];
        }
    }
}

The regular expression to find the URL might need some tweaking to match the exact page content.

Addon or GreaseMonkey script have a similar approach but addon can use native Firefox APIs. (but it is a lot more complicated than scripts)

Basically, this is the process (without knowing your exact requirements)

  1. Get the content of a remote URL with XMLHttpReques()

  2. Get the data that you need with RegEx or DOMParser()

  3. Change the current URL to that target with location.replace()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM