简体   繁体   English

从列表中删除重复的元素

[英]Remove duplicated elements from list

I have a list of character vector s:我有一个character vector list

my.list <- list(e1 = c("a","b","c","k"),e2 = c("b","d","e"),e3 = c("t","d","g","a","f"))

And I'm looking for a function that for any character that appears more than once across the list 's vector s (in each vector a character can only appear once), will only keep the first appearance.而且我正在寻找一个function ,对于在listvector s 中出现不止一次的任何character (在每个vector一个character只能出现一次),只会保留第一次出现。

The result list for this example would therefore be:因此,此示例的结果列表将是:

res.list <- list(e1 = c("a","b","c","k"),e2 = c("d","e"),e3 = c("t","g","f"))

Note that it is possible that an entire vector in the list is eliminated so that the number of elements in the resulting list doesn't necessarily have to be equal to the input list .请注意, list中的整个vector可能会被消除,因此结果list中的元素数量不一定必须等于输入list

We can unlist the list , get a logical list using duplicated and extract the elements in 'my.list' based on the logical index我们可以unlist list ,使用duplicated获取逻辑list并根据逻辑索引提取“my.list”中的元素

un <- unlist(my.list)
res <- Map(`[`, my.list, relist(!duplicated(un), skeleton = my.list))
identical(res, res.list)
#[1] TRUE

Here is an alternative using mapply with setdiff and Reduce .这是将mapplysetdiffReduce一起使用的替代方法。

# make a copy of my.list
res.list <- my.list
# take set difference between contents of list elements and accumulated elements
res.list[-1] <- mapply("setdiff", res.list[-1],
                                  head(Reduce(c, my.list, accumulate=TRUE), -1))

Keeping the first element of the list, we compute on subsequent elements and the a list of the cumulative vector of elements produced by Reduce with c and the accumulate=TRUE argument.保留列表的第一个元素,我们计算后续元素和由Reduce生成的元素累积向量列表,使用caccumulate=TRUE参数。 head(..., -1) drops the final list item containing all elements so that the lengths align. head(..., -1)删除包含所有元素的最终列表项,以便长度对齐。

This returns这返回

res.list
$e1
[1] "a" "b" "c" "k"

$e2
[1] "d" "e"

$e3
[1] "t" "g" "f"

Note that in Reduce , we could replace c with function(x, y) unique(c(x, y)) and accomplish the same ultimate output.请注意,在Reduce ,我们可以用function(x, y) unique(c(x, y))替换c并完成相同的最终输出。

I found the solutions here very complex for my understanding and sought a simpler technique.我发现这里的解决方案对于我的理解来说非常复杂,并寻求一种更简单的技术。 Suppose you have the following list.假设您有以下列表。

my_list <- list(a = c(1,2,3,4,5,5), b = c(1,2,2,3,3,4,4), 
                
                d = c("Mary", "Mary", "John", "John"))

The following much simpler piece of code removes the duplicates.以下更简单的代码段删除了重复项。

sapply(my_list, unique)

You will end up with the following.您将得到以下结果。

$a
[1] 1 2 3 4 5

$b
[1] 1 2 3 4

$d
[1] "Mary" "John"

There is beauty in simplicity!简约中有美!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM