简体   繁体   English

如何使用Firefox插件读取特定URL的html内容?

[英]How to read html content of a specific URL using Firefox addon?

I want to create an addon which will load html content of a specific url and save a specific line of that page and then move to that url. 我想创建一个加载项,该加载项将加载特定网址的html内容并保存该页面的特定行,然后移至该网址。 I read a lot of things on Mozila.org about content of a web page but I don't understand how to read the html content. 我在Mozila.org上阅读了很多有关网页内容的内容,但我不知道如何阅读html内容。

Here's a simple snippet that does XHR request, WITHOUT cookies. 这是一个XHR请求的简单片段,没有cookie。 Don't worry about cross-origin as you are running from privelaged scope, meaning you aren't coding this in a website but as a firefox addon. 当您从特权范围运行时,不必担心跨域的问题,这意味着您不是在网站中进行编码,而是在Firefox插件中进行编码。

var {Cu: utils, Cc: classes, Ci: instances} = Components;
Cu.import('resource://gre/modules/Services.jsm');
function xhr(url, cb) {
    let xhr = Cc["@mozilla.org/xmlextras/xmlhttprequest;1"].createInstance(Ci.nsIXMLHttpRequest);

    let handler = ev => {
        evf(m => xhr.removeEventListener(m, handler, !1));
        switch (ev.type) {
            case 'load':
                if (xhr.status == 200) {
                    cb(xhr.response);
                    break;
                }
            default:
                Services.prompt.alert(null, 'XHR Error', 'Error Fetching Package: ' + xhr.statusText + ' [' + ev.type + ':' + xhr.status + ']');
                break;
        }
    };

    let evf = f => ['load', 'error', 'abort'].forEach(f);
    evf(m => xhr.addEventListener(m, handler, false));

    xhr.mozBackgroundRequest = true;
    xhr.open('GET', url, true);
    xhr.channel.loadFlags |= Ci.nsIRequest.LOAD_ANONYMOUS | Ci.nsIRequest.LOAD_BYPASS_CACHE | Ci.nsIRequest.INHIBIT_PERSISTENT_CACHING;
    //xhr.responseType = "arraybuffer"; //dont set it, so it returns string, you dont want arraybuffer. you only want this if your url is to a zip file or some file you want to download and make a nsIArrayBufferInputStream out of it or something
    xhr.send(null);
}

Example usage of this snippet: 此代码段的示例用法:

var href = 'http://www.bing.com/'
xhr(href, data => {
    Services.prompt.alert(null, 'XHR Success', data);
});

Without knowing the page and URL to find on it I can't create a complete solution, but here's an example Greasemonkey script I wrote that does something similar. 在不知道要查找的页面和URL的情况下,我无法创建一个完整的解决方案,但这是我编写的示例Greasemonkey脚本,它执行类似的操作。

This script is for Java articles on DZone. 该脚本用于DZone上的Java文章。 When an article has a link to the source, it redirects to this source page: 当文章具有到源的链接时,它将重定向到该源页面:

// ==UserScript==
// @name        DZone source
// @namespace   com.kwebble
// @description Directly go to the source of a DZone article.
// @include     http://java.dzone.com/*
// @version     1
// @grant       none
// ==/UserScript==

var node = document.querySelector('a[target="_blank"]');

if (node !== null) {
    document.location = node.getAttribute('href');
}

Usage: 用法:

  • Install Greasemonkey if you haven't yet. 如果尚未安装Greasemonkey ,请安装它。
  • Create the script, similar to mine. 创建类似于我的脚本。 Set the value for @include to the page that contains the URL to find. 将@include的值设置为包含要查找的URL的页面。
  • You must determine what identifies the part of the page with the destination URL and change the script to find that URL. 您必须确定用目标URL标识页面部分的内容,然后更改脚本以找到该URL。 For my script it's a link with a target of "_blank". 对于我的脚本,这是目标为“ _blank”的链接。

After saving the script visit the page with the link. 保存脚本后,访问带有链接的页面。 Greasemonkey should execute your script and redirect the browser. Greasemonkey应该执行您的脚本并重定向浏览器。

[edit] This searches script tags for text like you described and redirects. [edit]这会在脚本标签中搜索您所描述的文本并进行重定向。

// ==UserScript==
// @name        Test
// @namespace   com.kwebble
// @include     your_page
// @version     1
// @grant       none
// ==/UserScript==

var nodes = document.getElementsByTagName('script'),
    i, matches;

for (i = 0; i < nodes.length; i++) {
    if (nodes.item(i).innerHTML !== '') {
        matches = nodes.item(i).innerHTML.match(/windows\.location = "(.*?).php";/);

        if (matches !== null){
            document.location = matches[1];
        }
    }
}

The regular expression to find the URL might need some tweaking to match the exact page content. 查找URL的正则表达式可能需要进行一些调整以匹配确切的页面内容。

Addon or GreaseMonkey script have a similar approach but addon can use native Firefox APIs. Addon或GreaseMonkey脚本具有类似的方法,但是addon可以使用本机Firefox API。 (but it is a lot more complicated than scripts) (但是比脚本要复杂得多)

Basically, this is the process (without knowing your exact requirements) 基本上,这就是过程(不知道您的确切要求)

  1. Get the content of a remote URL with XMLHttpReques() 使用XMLHttpReques()获取远程URL的内容

  2. Get the data that you need with RegEx or DOMParser() 使用RegEx或DOMParser()获取所需的数据

  3. Change the current URL to that target with location.replace() 使用location.replace()将当前URL更改为该目标。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM