简体   繁体   中英

Vec::dedup does not work — how do I deduplicate a vector of strings?

I've parsed a file, split the string by lines and want to leave only unique elements in each vector. I expect vec.dedup() to work like this:

let mut vec = vec!["a", "b", "a"];
vec.dedup();
assert_eq!(vec, ["a", "b"]);

But it fails:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `["a", "b", "a"]`,
 right: `["a", "b"]`', src/main.rs:4:4

How can I remove duplicates?

As documented, Vec#dedup only removes consecutive elements from a vector (it is much cheaper than a full deduplication). It would work fine if the vector was vec!["a", "a", "b"] , for example.

Of course, there are multiple potential solutions.

In order to obtain a vector with all duplicates removed while retaining the original order of the elements, the itertools crate provides a unique adaptor.

use itertools::Itertools;

let v = vec!["b", "a", "b"];
let v: Vec<_> = v.into_iter().unique().collect();
assert_eq!(v, ["b", "a"]);

If element order is not important, you may sort the elements first and then call dedupe.

let mut v = vec!["a", "b", "a"];
v.sort_unstable();
v.dedup();
assert_eq!(v, ["a", "b"]);

If fast element lookup is important, you may also consider using a set type instead, such as HashSet .

let v: HashSet<_> = ["a", "b", "a"].iter().cloned().collect();
let v2: HashSet<_> = ["b", "a"].iter().cloned().collect();
assert_eq!(v, v2);

The other answer points out that a HashSet is a better choice for a collection without duplicates, which I agree with. This shows how to directly deduplicate a Vec using that property of HashMap and without sorting the Vec first to use std::vec::Vec::dedup .

use std::hash::Hash;
use std::collections::HashSet;

fn dedup<T: Eq + Hash + Copy>(v: &mut Vec<T>) { // note the Copy constraint
    let mut uniques = HashSet::new();
    v.retain(|e| uniques.insert(*e));
}

fn main() {
    let mut v = vec!["a", "b", "a"];
    dedup(&mut v);

    assert_eq!(&v, &vec!["a", "b"]);
}

This is a fast (O(n)) solution, but creating the HashSet requires some extra memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM