简体   繁体   English

将每个向量的唯一元素保存在向量列表中

[英]Keep unique elements of each vector in a list of vectors

I have a dataframe with 1.6 million rows and one of the columns is a list of character vectors.我有一个 dataframe 有 160 万行,其中一列是字符向量列表。

Each element of this list column looks as follows: c("A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61Q", "B05B") .此列表列的每个元素如下所示: c("A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61Q", "B05B")

I would like for it to be c("A61K","A61Q","B05B") .我希望它是c("A61K","A61Q","B05B")

Meaning I just want to keep the unique values.意思是我只想保留独特的价值。 This process should be repeated for each row.应对每一行重复此过程。

I have tried this:我试过这个:

sapply(strsplit(try, "|", function(x) paste0(unique(x), collapse = ",")))

And solutions using for loops but it takes very long and R stops running.和使用 for 循环的解决方案,但它需要很长时间并且 R 停止运行。

Use unique使用unique

> string <- c("A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61Q", "B05B")
> unique(string)
[1] "A61K" "A61Q" "B05B"

You can handle it using unique() within lapply() :您可以在lapply()中使用unique()处理它:

# example df with list column
dat <- data.frame(id = 1:2)
dat$x <- list(
  c("A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61Q", "B05B"),
  c("A62K", "A61K", "A61K", "A58J", "A61K", "A61K", "A61K", "A61K", "A61K", "A61K", "A61Q", "C97B")
)

dat 
  id                                                                      x
1  1 A61K, A61K, A61K, A61K, A61K, A61K, A61K, A61K, A61K, A61K, A61Q, B05B
2  2 A62K, A61K, A61K, A58J, A61K, A61K, A61K, A61K, A61K, A61K, A61Q, C97B
# remove duplicates within list column by row
dat$x <- lapply(dat$x, unique)

dat
  id                            x
1  1             A61K, A61Q, B05B
2  2 A62K, A61K, A58J, A61Q, C97B

To filter the data frame use duplicated .要过滤数据框,请使用duplicated

If this is your data如果这是你的数据

df
    str data
1  A61K    1
2  A61K   23
3  A61K    4
4  A61K    3
5  A61K    1
6  A61K   23
7  A61K    4
8  A61K    3
9  A61K    1
10 A61K   23
11 A61Q    4
12 B05B    3

Apply filter using desired column使用所需的列应用过滤器

df[!duplicated(df$str), ]
    str data
1  A61K    1
11 A61Q    4
12 B05B    3

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R 将向量列表聚合为 n 个向量元素的唯一组合并求和等于组合 - R aggregate list of vectors to unique combinations of n vector elements and sum equal combinations 如何将两个向量的元素合并到列表中并根据原始向量保留名称? - How to join two vectors' elements in a list and keep names according to origin vector? Dataframe 从一个向量和一个向量列表通过复制元素 - Dataframe from a vector and a list of vectors by replicating elements R指向向量列表中的向量的元素 - R referring to elements of a vector in a list of vectors 在向量列表中将所有向量元素设置为NA - Set all vector elements to NA in a list of vectors 向 R 中的向量列表中的每个向量添加一个字符 - Add a character to each vector in a list of vectors in R R - 给定一个向量列表,用循环替换每个向量的向量中的值 - R - Given a List of Vectors replace values in Vectors for each Vector with LOOP 如何组合多个向量,使每个向量的元素尽可能均匀分布? - How to combine multiple vectors such that elements of each vector are distributed as equally as possible? 如何在替换每个向量中的元素时将多个向量合并为一个 - how to combine multiple vectors into one while replacing elements in each vector 如何在保留这些唯一元素的源向量的同时返回向量之间的唯一元素? - How to return the unique elements between vectors while retaining the source vector of these unique elements?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM