Javascript 從（多行）字符串中提取唯一數組所需的時間太長

Question

我有一個巨大的數據集，它被加載到 pre 標簽中，如下所示。

00:00:00 INFO SERVER-SYSTEM - Cmd Line Arg: sysName = SERVER
00:00:01 INFO SERVER-SYSTEM - Cmd Line Arg: resultsDirName = github
00:00:02 INFO SERVER-SYSTEM - Cmd Line Arg: Device4Branch = //github/server_manager01/test1
00:00:02 FAIL SERVER-SYSTEM - Cmd Line Arg: testCase = server_manager01
00:00:03 INFO SERVER-SYSTEM - Cmd Line Arg: timestamp_style = RELATIVE
00:00:04 INFO SERVER-SYSTEM - Cmd Line Arg: token = 36
00:00:04 FAIL SERVER-SYSTEM - Cmd Line Arg: Campaign = True

將有大約 30,000 多行，我想將唯一的單詞存儲在一個數組中。 以下是從 div 中獲取數據的代碼行，其中包含此預標記數據並將由空格分隔的唯一單詞存儲到數組中。

pre_data = document.getElementById("data_div").innerHTML.split('\n');
var words = [];
var reg = new RegExp("\\S*", "ig");

for (x = 0; x < pre_data.length; x++) {
   words = words .concat(pre_data[x].match(reg));
}

// To remove null values
filtered_data = words .filter(function (el) {
   return el != ''; });

// Set gives unique data
unique_data = Array.from(new Set(filtered_data ));

但是如果有 30,000+ 行，這需要 10+ 秒。 什么可能是更快獲得它的有效方法？

Answer 1

我不確定這是否是您正在尋找的，但這就是我的做法；

找到元素並獲取 innerHTML 或 innerText 並通過 newLine 將其拆分，這將為您提供一個數組
減少行並創建一個包含每行所有單詞的數組
- 用空格分割行，這會給你一個單詞數組
- 過濾掉空詞w => w
- 將單詞數組添加到累加器並返回
現在您擁有所有單詞，您可以過濾掉重復的單詞

PS：您不必像我一樣創建 3 個變量，您可以鏈接數組函數並根據需要創建一個變量

 var linesArray = document.getElementById('my-pre').innerHTML.split('\n'); var words = linesArray.reduce((acc, cur) => { return acc.concat(cur.split(' ').filter(w => w)); }, []) var uniqueWords = words.filter((value, index, self) => self.indexOf(value) === index) console.log(uniqueWords)

 <pre id="my-pre"> Hello Stackoverflow I'm having some issues with extracting a unique array out of my multiline string this is a test to see if the the array is unique enough this is a test to see if the the array is unique enough this is a test to see if the the array is unique enough </pre>

Javascript 從（多行）字符串中提取唯一數組所需的時間太長

問題描述

1 個解決方案

解決方案1
0 已采納 2020-07-08 10:44:57

Javascript 從（多行）字符串中提取唯一數組所需的時間太長

問題描述

1 個解決方案

解決方案1 0 已采納 2020-07-08 10:44:57

解決方案1
0 已采納 2020-07-08 10:44:57