简体   繁体   English

拆分字符串并确保结果数组中没有重复项的最有效方法是什么?

[英]What is the most efficient way of splitting a string and ensuring there are no duplicates in the resulting array?

I am splitting a javscript string into an array whose elements just contain sequences of cyrillic characters. 我将一个javscript字符串拆分为一个数组,该数组的元素仅包含西里尔字符序列。

    var text = "где по его проекту был реализован первый в мире компьютер с хранимой в памяти программой — ACE."
    text=text.toLowerCase();
    var re = /[^йцукенгшщзхъёэждлорпавыфячсмитьбю]+/;
    words = text.split(re);

In the above snippet words will contain the following 在以上代码段中,单词将包含以下内容

["где", "по", "его", "проекту", "был", "реализован", "первый", "в", "мире", "компьютер", "с", "хранимой", "в", "памяти", "программой", ""] 

I need to remove the duplicate from the array. 我需要从阵列中删除重复项。 Namely I should only see the occurence of "в" once. 即,我应该只看到“•”的出现一次。 I know I can after the split and go through the array doing this but not sure what is the best way. 我知道拆分后可以遍历数组,但是不确定什么是最好的方法。 Is it possible to do this with the split regex? 可以使用分割正则表达式来做到这一点吗?

Jonathan 乔纳森

Not the most efficient, but it's clean and simple. 不是最有效的,但是它很干净而且很简单。

text.split(re).filter(function(str, idx, txtArray) {
    return txtArray.indexOf(str) === idx; 
});

Basically, if the first index found doesn't match the current index in the iteration, it's a duplicate. 基本上,如果在迭代中找到的第一个索引与当前索引不匹配,则它是重复项。

You have to go through the array. 您必须遍历数组。 You can remember whether you've seen instances of the string before using an object as a map, eg: 您可以记住在将对象用作映射之前是否已经查看过字符串的实例,例如:

var a = /* ...get the array... */;
var unique = [];
var n, len;
var str;
var seen = {};
for (n = 0, len = a.length; n < len; ++n) {
    str = a[n];
    if (!seen[str]) {
        seen[str] = true;
        unique.push(str);
    }
}

If there's any chance one of the string values may be a name that already exists on objects (so, "toString" , "valueOf" , "hasOwnProperty" , and such), you have to modify the if (!seen[str]) check to use hasOwnProperty instead: 如果字符串值之一可能是对象上已经存在的名称(因此, "toString""valueOf""hasOwnProperty"等),则必须修改if (!seen[str])检查使用hasOwnProperty代替:

if (!seen.hasOwnProperty(str)) {

...but if the strings are as you've shown, you don't need that. ...但是如果字符串如您所显示的那样,则不需要。 Another alternative is to use a prefix like "xx": 另一种选择是使用前缀“ xx”:

var keystr = "xx" + str;
if (!seen[keystr]) {
    seen[keystr] = true;
    // ...
}

Since there are no object properties on raw objects that start with "xx" , and almost certainly never will be. 由于在以"xx"开头的原始对象上没有对象属性,因此几乎可以肯定不会。


In a comment you've said: 在评论中,您说过:

I guess by efficient I mean the most elegant of idiomatic javascript way to do this. 我想高效是指惯用javascript最优雅的方式来做到这一点。

Interesting, that's not a definition I'd've used. 有趣的是,这不是我使用的定义。 :-) Okay, here's another approach using ES5's filter , which is definitely more JavaScript-y: :-)好的,这是使用ES5的filter的另一种方法,绝对是JavaScript-y:

var a = /* ...get the array... */;
var seen = {};
a = a.filter(function(str) {
    if (!seen[str]) {
        seen[str] = true;
        return true;
    }
    return false;
});

If you are willing to use a third party library, then I would recommend to have a look at Underscore . 如果您愿意使用第三方库,那么我建议您看一下Underscore This Library provides a uniq method, that you would apply in the following way: 该库提供了一种uniq方法,您可以通过以下方式应用该方法:

words = _.uniq(text.split(re));

You can get the "prettiness" of the .indexOf solution using some other built-in functions: 您可以使用其他一些内置函数来获得.indexOf解决方案的“ .indexOf性”:

var uniq = Object.keys(text.split(re).reduce(function(words, word) {
  words[word] = null;
  return words;
}, {}));

This'll only work in newer versions of JavaScript (that is, not old versions of IE). 这仅适用于JavaScript的新版本(即IE的旧版本)。 This has the advantage, like Mr. Crowder's version, of not being an O(n 2 ) algorithm. 像Crowder先生的版本一样,它具有不是O(n 2算法的优点。 On fairly large strings without many duplicates (say, a page full of text), those .indexOf() calls will start to warm up the client CPU. 在没有很多重复项的相当大的字符串(例如,充满文本的页面)上,那些.indexOf()调用将开始预热客户端CPU。

Note that this will give you the unique words in no particular order. 请注意,这将为您提供不特定顺序的唯一单词。

如何在正则表达式中使用负前瞻并使用.match方法返回匹配数组。

([йцукенгшщзхъёэждлорпавыфячсмитьбю]+)(?!.*\1)

You could do this (splitter : " " ) : 您可以这样做(分割符: " " ):

var m = 'azerty rty aze rty aze'
    .replace(/(^| )([^ ]+)(?= |$)(?=.* \2( |$))/g, '') // removes duplicates
    .match(/[^ ]+/g) 
m; // ["azerty", "rty", "aze"]

Surely not the most efficient way though. 当然不是最有效的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将字符串转换为数字的最有效方法是什么? - What is the most efficient way to convert a string to a number? 抓取对象而不在数组中重复的最有效方法 - Most efficient way to grab objects without any duplicates in an array 在 Javascript 中反转数组的最有效方法是什么? - What is the most efficient way to reverse an array in Javascript? 组合阵列以消除重复项的最有效方法 - Most Efficient way to Combine Arrays removing Duplicates 检查一组集合中的重复项的更有效方法是什么 - What is a more efficient way to check for duplicates in an array of sets 使用JavaScript评估字符串是否是回文符的最有效方法是什么? - What's the most efficient way to evaluate if a string is a palindrome using Javascript? 在Java中通过字符串的最有效方式是什么? - What is the most efficient way to go through string's chars in Javascript? 在字符串中获取最后一个换行符的最有效方法是什么 - What is the most efficient way to get the last line break in a string 在字符串中查找常用词的最有效方法是什么[暂停] - What is the most efficient way to find the common words in a String [on hold] 在 JavaScript 中显示二维数组的最有效方法是什么? - What is the most efficient way to display a 2d array in JavaScript?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM