检查数组的某个元素是否与以下相同

Question

我正在创建一个将 pdf 解析为文本的服务。 当我有那个文本时，我必须匹配一组单词。 每次匹配时，它都会增加一个计数器。 到现在为止还挺好。 困难在于，在解析文本时，我无法检查我在 pdf 的哪一页。 我已经意识到，在拆分中，每次有两个连续的换行符（/n/n）就意味着有一个页面更改。

我想要做的是检查页面是否已更改，并且除了计算一个单词被发现的总次数之外，还要说明它在哪些页面上。

例子

let data =  `resignations / resignations. adm. mancom .: berenguer llinares
appointments. adm. unique: calvo valenzuela. other concepts: change of the administrative body:
joint administrators to sole administrator. change of registered office. ptda colomer, 6

Official Gazette of the Commercial Registry
no. 182 Friday, September 18, 2020 p. 33755
cve: borme-a-2020-182-03 verifiable in
sarria). registry data. t 2257, f 100, s 8, h a 54815, i / a 4 (10.09.20) .`



let wordsToSearch = ['resignations', "administrators"]

    wordsToSearch.forEach((word) => {
// inside of here would like to have track of the page as well
        let stringArray = data.split(' ');
        let count = 0;
        let result = ""
        for (var i = 0; i < stringArray.length; i++) {
            let wordText = stringArray[i];
            if (new RegExp(word).test(wordText)) {
                count++
            }
        }
        // the expected result would word has appeared count times in the pages etc
        result += `${word} has appeared ${count} times\n`
        console.log(result)
        /*
        resignations has appeared 2 times

        administrators has appeared 1 times
        */
    })

如果有人也想出另一种方式，那就太好了

Answer 1

您可以在那些双换行符处拆分文本，然后单独分析每个页面。 这是我将如何做到的：

 let data = `resignations / Friday resignations. adm. mancom .: berenguer llinares appointments. adm. unique: calvo Friday valenzuela. other concepts: change of the administrative body: joint administrators to sole administrator. change of registered office. ptda colomer, 6, Friday Official Gazette of the Commercial Registry no. 182 Friday, September 18, 2020 p. 33755 cve: borme-a-2020-182-03 verifiable in sarria). registry data. t 2257, f 100, s 8, ha 54815, i / a 4 (10.09.20) .` function analyseText(text, wordsToFind) { const pages = data.split("\\n\\n"); const result = {}; for (let pageIndex = 0; pageIndex < pages.length; pageIndex++) { analysePage({ pageIndex, pageText: pages[pageIndex] }, wordsToFind, result); } return Object.keys(result).map(k => result[k]); } function analysePage(page, wordsToFind, result) { const { pageText, pageIndex } = page; wordsToFind.forEach(word => { const count = (pageText.match(new RegExp(word, 'g')) || []).length; if (count > 0) { if (!result[word]) { result[word] = { name: word, pageIndices: [], count: 0 }; } result[word].pageIndices.push(pageIndex); result[word].count += count; } }); } const result = analyseText(data, ['resignations', "administrators", "Friday"]); console.log(result);

在这个例子中，我只是打印每一页的结果，但你当然可以建立一些结果对象，你可以在其中保存每一页的结果。

检查数组的某个元素是否与以下相同

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-09-19 21:23:36

检查数组的某个元素是否与以下相同

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-09-19 21:23:36

解决方案1
1 已采纳 2020-09-19 21:23:36