简体   繁体   English

Javascript 从(多行)字符串中提取唯一数组所需的时间太长

[英]Javascript is taking too long to extract a unique array out of a (multiline) string

I have a huge data set that gets loaded in pre tag, like the following.我有一个巨大的数据集,它被加载到 pre 标签中,如下所示。

00:00:00 INFO SERVER-SYSTEM - Cmd Line Arg: sysName = SERVER
00:00:01 INFO SERVER-SYSTEM - Cmd Line Arg: resultsDirName = github
00:00:02 INFO SERVER-SYSTEM - Cmd Line Arg: Device4Branch = //github/server_manager01/test1
00:00:02 FAIL SERVER-SYSTEM - Cmd Line Arg: testCase = server_manager01
00:00:03 INFO SERVER-SYSTEM - Cmd Line Arg: timestamp_style = RELATIVE
00:00:04 INFO SERVER-SYSTEM - Cmd Line Arg: token = 36
00:00:04 FAIL SERVER-SYSTEM - Cmd Line Arg: Campaign = True

There will be around 30,000+ lines and I want to store unique words in an array.将有大约 30,000 多行,我想将唯一的单词存储在一个数组中。 Following are lines of code to fetch data from the div which hold this pre tag data and stores unique words separated by space into an array.以下是从 div 中获取数据的代码行,其中包含此预标记数据并将由空格分隔的唯一单词存储到数组中。

pre_data = document.getElementById("data_div").innerHTML.split('\n');
var words = [];
var reg = new RegExp("\\S*", "ig");

for (x = 0; x < pre_data.length; x++) {
   words = words .concat(pre_data[x].match(reg));
}

// To remove null values
filtered_data = words .filter(function (el) {
   return el != ''; });

// Set gives unique data
unique_data = Array.from(new Set(filtered_data ));

But this is taking 10+ seconds if there are 30,000+ lines.但是如果有 30,000+ 行,这需要 10+ 秒。 What could be an effective way to get it faster?什么可能是更快获得它的有效方法?

I'm not sure whether this is what you are looking for but this is how I would do it;我不确定这是否是您正在寻找的,但这就是我的做法;

  • Find the element and get the innerHTML or innerText and split it by newLine which will give you an array找到元素并获取 innerHTML 或 innerText 并通过 newLine 将其拆分,这将为您提供一个数组
  • Reduce the lines and create one array of all words of each line减少行并创建一个包含每行所有单词的数组
    • Split the line by white space which will give you an array of words用空格分割行,这会给你一个单词数组
    • Filter out the empty words w => w过滤掉空词w => w
    • Add the words array to the accumulator and return it将单词数组添加到累加器并返回
  • Now that you have all words you can filter out the duplicated现在您拥有所有单词,您可以过滤掉重复的单词

PS: you don't have to create 3 variables as I did, you can chain the array functions and create one variable if you want PS:您不必像我一样创建 3 个变量,您可以链接数组函数并根据需要创建一个变量

 var linesArray = document.getElementById('my-pre').innerHTML.split('\n'); var words = linesArray.reduce((acc, cur) => { return acc.concat(cur.split(' ').filter(w => w)); }, []) var uniqueWords = words.filter((value, index, self) => self.indexOf(value) === index) console.log(uniqueWords)
 <pre id="my-pre"> Hello Stackoverflow I'm having some issues with extracting a unique array out of my multiline string this is a test to see if the the array is unique enough this is a test to see if the the array is unique enough this is a test to see if the the array is unique enough </pre>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM