有沒有辦法過濾html文檔的數據？

Question

我目前正在開發一個 chrome 擴展，它使用網站的 html 文檔來提取數據，但我需要制作一個過濾器來獲得我真正想要的。

在這次嘗試中，擴展程序獲取頁面的 HTML 並將其轉換為字符串，以便可以輕松操作：

//This method gets a string and counts how many times
//the word you're looking for its in the string
function countWordInAString(string, word) {
    return (string.match(new RegExp(word, "g")) || []).length;
}

function getOutlookData(html) {
    var unreaded = countWordInAString(html, 'no leídos');
    var readed = countWordInAString(html, 'leídos');
    var totalMails = countWordInAString(html, 'id="AQAAA1thnTQBAAAEA7R1mgAAAAA="');
    var message = totalMails + 'Mails loaded! \n Mails readed: ' + readed + '\n Mails unreaded: ' + unreaded;

    return message + '\n' + "HTML:\n" + html;
}

它在某些特定情況下有效，但對於混淆的網站（如本例中的 Outlook），結果是錯誤的。 我能做些什么來改善它？

Answer 1

您的“單詞”可能包含特殊字符。 當傳遞給您的正則表達式時，使用反斜杠對其進行編碼，即

const encodeForReg = str => str.replace(/([^\s\w])/g, '\\$1');
function countWordInAString(string, word) {
    const encodedWord = encodeForReg(word);
    return (string.match(new RegExp(encodedWord, "g")) || []).length;
}

id="AQAAA1thnTQBAAAEA7R1mgAAAAA="

變成

id\=\"AQAAA1thnTQBAAAEA7R1mgAAAAA\=\"

有沒有辦法過濾html文檔的數據？

問題描述

1 個解決方案

解決方案1
0 已采納 2020-10-04 04:55:55

有沒有辦法過濾html文檔的數據？

問題描述

1 個解決方案

解決方案1 0 已采納 2020-10-04 04:55:55

解決方案1
0 已采納 2020-10-04 04:55:55