简体   繁体   English

JavaScript方法从字符串/数字中删除不区分大小写的重复项

[英]JavaScript method to remove case insensitive duplicates from string/Numbers

Looking for a clean way to remove duplicates and keep with the first occurrence of the duplicate number/letter found 寻找一种干净的方法来删除重复项,并保持第一次发现重复项的号码/字母

Let's say I have a string 假设我有一个字符串

AbCyTtaCc113 AbCyTtaCc113

What I want remaining is 我想要剩下的是

AbCyT13 抗体13

  [...input].filter((s => c => !s.has(c.toLowerCase()) && s.add(c.toLowerCase()))(new Set)).join("")

Spreading a string results in an array with one character strings, by using a Set you can easily filter out duplicates. 传播字符串会导致一个包含一个字符串的数组,通过使用Set,您可以轻松过滤出重复项。 The logic is basically: 逻辑基本上是:

  • Turn the string into an array of characters by spreading it ( [...input] ). 通过扩展字符串( [...input] )将字符串转换为字符数组。

  • Create a Set and store it inside of the closure as s . 创建一个Set并将其存储为s在闭包内部。 ( (s => ...)(new Set) ) (s => ...)(new Set)

  • Filter out the characters if 过滤出字符

    • the character is in the Set already ( !s.has(c.toLowerCase()) 字符已经在Set中( !s.has(c.toLowerCase())

    • if it is not, add it to the Set and keep it ( && s.add(c.toLowerCase()) ) 如果不是,请将其添加到Set中并保留它( && s.add(c.toLowerCase())

  • Turn the filtered array back into a string by joining it. 通过联接将过滤后的数组重新变成字符串。

Or the case insensitive version: 或不区分大小写的版本:

[...new Set(input)].join("")

The imperative version would be: 命令版本将是:

  let result = "";
  {
     const duplicates = new Set();
     for(const char of input) {
        if(!duplicates.has(char.toLowerCase()) {
          duplicates.add(char.toLowerCase());
          result += char;
        }
     }
  }

this doesn't need ES6, it does not make it faster, nor cleaner for this case. 这种情况不需要ES6,也不会使其更快,也不会更干净。 Walk it once and reduce: 行走一次并减少:

"AbCyTtaCc113"
.split("")
.reduce((ac,d)=>{!~ac.toLowerCase().indexOf(d.toLowerCase()) && (ac += d); return ac;},"")

Using Array.prototype.reduce would be one way to address this. 使用Array.prototype.reduce将是解决此问题的一种方法。

Edit: As others have mentioned, there's valid reasons though not to use this for production level solutions. 编辑:正如其他人提到的那样,尽管没有将其用于生产级解决方案,这是有充分理由的。 So just take it as one possible , though not advisable way. 因此,尽管不建议这样做,但应将其作为一种可能

 console.log( [..."AbCyTtaCc113"] .reduce( (acc, val) => acc.includes(val.toUpperCase()) || acc.includes(val.toLowerCase()) ? acc : [...acc, val] , [] ) .join("") ) 

I believe @ggorlen has the appropriate solution, although he uses some rather modern JavaScript (I haven't seen a syntax before using back-ticks instead of parenthesis to call functions, but I'll trust it works in some interpreters) 我相信@ggorlen有适当的解决方案,尽管他使用了一些相当现代的JavaScript(我在使用反引号而不是括号之前没有看到语法来调用函数,但我相信它可以在某些解释器中使用)

For a simpler "old-school" answer, try something like the following: 对于简单的“老式”答案,请尝试以下类似方法:

var input = "AbCyTtaCc113";
var output = "";
var unique = {};
for (var i = 0; i < input.length; i++) {
    if (!unique[input[i].toLowerCase()]) {
        unique[input[i].toLowerCase()] = 1;
        output += input[i];
    }
}

Second update 第二次更新

The ES6 solution below was preserved for historical purposes, but @ggorlen pointed out that I failed to preserve casing on the output. 下面的ES6解决方案出于历史目的而保留,但是@ggorlen指出我无法保留输出中的大小写。 Therefore the conversion to lowercase should only occur within the filter check and not prior: 因此,转换为小写字母的操作仅应在过滤器检查内进行,而不是在以下条件下进行:

var input = "AbCyTtaCc113";
var seen = new Set();
var output = input
    .split("")
    .filter(x => !seen.has(x.toLowerCase()) && seen.add(x.toLowerCase()))
    .join("")

Update 更新资料

Since everyone is having fun writing ES6 answers and trying to solve this with reduce or map or Set or some other such tool, let me write what I think is the best answer using ES6: 由于每个人都在写ES6答案并尝试使用reducemapSet或其他工具来解决这个问题,所以让我用ES6写下我认为是最好的答案:

var input = "AbCyTtaCc113";
var seen = new Set();
var output = input
    .split("")
    .map(x => x.toLowerCase())
    .filter(x => !seen.has(x) && seen.add(x))
    .join("")

Or, if you prefer unreadable one-line nonsense: 或者,如果您更喜欢不可读的单行废话:

var input = "AbCyTtaCc113";

var seen = new Set(), output = input.split("").map(x => x.toLowerCase()).filter(x => !seen.has(x) && seen.add(x)).join("");

I prefer this solution because: 我更喜欢这种解决方案,因为:

  1. It utilizes method chaining to pass our input through a series of simple transforms which can be easily read and interpreted by future programmers (first split, then map, then filter, then join) 它利用方法链将我们的输入通过一系列简单的转换传递,以后的程序员可以轻松阅读和解释(首先拆分,然后映射,然后过滤,然后加入)
  2. It avoids syntax that isn't supported in many JavaScript interpreters (no spread operators, back-ticks, etc) 它避免了许多JavaScript解释器不支持的语法(没有传播运算符,反引号等)
  3. It uses each method for its semantic purpose (split converts a string to an array, map modifies each value of an array, filter selects a subset of an array, join converts an array to a string) 它出于语义目的使用每种方法(拆分将字符串转换为数组,map修改数组的每个值,过滤器选择数组的子集,join将数组转换为字符串)

That said, if performance is your primary concern and you're running this in the Google Chrome console, I must admit that preliminary benchmarks peg Jonas Wilm's answer as faster than mine: 就是说,如果性能是您最关心的问题,并且您正在Google Chrome控制台中运行它,那么我必须承认,初步基准测试将乔纳斯·威尔姆的回答比我的回答要快:

var d0 = new Date(); for (var i = 0; i < 50000; i++) {
    var input = "AbCyTtaCc113";
    var output = [...input].filter((s => c => !s.has(c.toLowerCase()) && s.add(c.toLowerCase()))(new Set)).join("");
} console.log(new Date() - d0);
// ~175

var d0 = new Date(); for (var i = 0; i < 50000; i++) {
    var input = "AbCyTtaCc113";
    var seen = new Set();
    var output = input
        .split("")
        .map(x => x.toLowerCase())
        .filter(x => !seen.has(x) && seen.add(x))
        .join("");
} console.log(new Date() - d0);
// ~231

I believe this is because .map() is allocating a new array in memory rather than modifying the array in-place (whereas his solution utilizing .toLowerCase() within .has() modifies the values in place so uses no extra memory), but I prefer this for clarity. 我相信这是因为.map()在内存中分配了一个新数组,而不是就地修改该数组(而他在.has()利用.toLowerCase()解决方案修改了该值,因此不使用额外的内存),但是为了清楚起见,我更喜欢这样做。 Unless you're dealing with code where performance is of the utmost importance, I think being able to read the code is more important than eeking out an extra millisecond. 除非您要处理对性能至关重要的代码,否则我认为能够读代码比花费额外的毫秒时间更重要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM