简体   繁体   English

Chrome 扩展程序:从给定的 url 获取网页的文本

[英]Chrome extension: Get the text of a web page from given url

First, I'm completly newbie making chrome extension, then in a part of the chrome extension I will receive differents urls and I want to store the text of the web page to process it later, resulting in an array of boolean variables, each associated with the given url.首先,我完全是制作 chrome 扩展程序的新手,然后在 chrome 扩展程序的一部分中,我将收到不同的 url,我想存储网页的文本以供稍后处理,从而产生一个布尔变量数组,每个变量都关联使用给定的网址。 Schematically it would be something like this:从原理上讲,它会是这样的:

var result;
function process(text){
  if something -> result.push(true);
  if not -> result.push(false);
}
function main(){
  for (i...){
    url = given[i];
    text = getHTMLText(url);
    process(text);
  }
  final();//when the loop finish activate another function that use the global variable: result
}

I have problems with main function, first I have tried with synchronous XMLHttpRequest, although it works it's very slow and chrome always give the warning that synchronous XMLHttpRequest is deprecated.我的 main 函数有问题,首先我尝试了同步 XMLHttpRequest,虽然它的工作速度很慢,而且 chrome 总是发出同步 XMLHttpRequest 已被弃用的警告。

    for (var i = 0; i < urls.length; i++){
        url = urls[i];

        var req = new XMLHttpRequest();
        req.open('GET', url, false); 
        req.send(null);
        if (req.status == 200) detecting(req.responseText);
    };

Other solution that I find was use fetch(url), but the code that I find I don't fully understand.我发现的其他解决方案是使用 fetch(url),但我发现我不完全理解代码。 Although the returned text works correctly but then the proccess function give different results on each page update.虽然返回的文本工作正常,但是 proccess 函数在每次页面更新时给出不同的结果。

    for (var i = 0; i < urls.length; i++){
        url = urls[i];
        fetch(url).then(function(response) { 
            response.text().then(function(text) {
            detecting(text);
            });
        });
    };

Other problem, but this is because of the little knowledge I have of fetch(), was that I can't store the text out of the fetch(), every time I do console.log give undefined, this greatly complicates the processing of the text for me.其他问题,但这是因为我对 fetch() 知之甚少,是我无法将text存储在 fetch() 之外,每次我执行 console.log 都给 undefined,这使处理变得非常复杂给我的文字。

I have seen that maybe it can be done through extension APIs of chrome but I can't see how to do it.我已经看到,也许可以通过 chrome 的扩展 API 来完成,但我不知道如何去做。

The algorithm shown in your main pseudocode can be implemented easily by using async/await and Promise.all, without a for loop:通过使用 async/await 和 Promise.all 可以轻松实现main伪代码中显示的算法,无需for循环:

(async () => {
  const results = await Promise.all(urls.map(processUrl));
  console.log(results);
  // further processing must be also inside this IIFE
})();

async function processUrl(url) {
  try {
    const text = await (await fetch(url)).text();
    return {url, text, status: detecting(text)};
  } catch (error) {
    return {url, error};
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM