从 R 的特定列中删除单元格中的重复项？

Question

I have a data frame which contains repeated characters or numbers.我有一个包含重复字符或数字的数据框。 I want to create a new df which only contains unique cells in each of these columns.我想创建一个新的 df ，它只包含这些列中的唯一单元格。 Below is a visual of what I am trying to achieve.下面是我想要实现的目标的视觉效果。 Any ideas would be highly appreciated.任何想法将不胜感激。

Answer 1

Here's a regex solution (based on mock data in the absence of reproducible data):这是一个正则表达式解决方案（基于没有可重现数据的模拟数据）：

library(stringr)
df[,1:3] <- lapply(df[,1:3], function(x) str_extract_all(x, "(\\b\\w+\\b)(?!.*\\1)"))

The solution drwas on negative lookahead ( (?....) ) and backreference ( \\1 ): the pattern (\\b\\w+\\b)(?..*\\1) is used to str_extract_all alphanumeric strings unless they are repeated later in the string, which effectively captures all unique values:解决方案是负前瞻（ (?....) ）和反向引用（ \\1 ）：模式(\\b\\w+\\b)(?..*\\1)用于str_extract_all字母数字字符串，除非它们稍后在字符串中重复，这有效地捕获了所有唯一值：

Result:结果：

df
               Title   Length    Prediction
1             George 555, 666           111
2 Alice, Peter, Kate 123, 444 333, 777, 222

Data:数据：

df <- data.frame(
  Title = c("George,George,George", "Kate,Alice,Kate,Peter,Kate"),
  Length = c("555,555,666", "123,123,444,123,444"), 
  Prediction = c("111,111,111", "222,333,222,777,222"), stringsAsFactors = F)

Answer 2

Do like this.这样做。 Using the df created by ChrisRuehlemann使用 ChrisRuehlemann创建的df

library(tidyverse)
df %>% mutate(across(everything(), ~str_split(., ",")),
              across(everything(), ~map(., ~unique(.x))))
               Title   Length    Prediction
1             George 555, 666           111
2 Kate, Alice, Peter 123, 444 222, 333, 777

Or one-liner或单线

mutate(df, across(everything(), ~map(str_split(., ","), ~unique(.x))))

               Title   Length    Prediction
1             George 555, 666           111
2 Kate, Alice, Peter 123, 444 222, 333, 777

从 R 的特定列中删除单元格中的重复项？

问题描述

2 个解决方案

解决方案1
0 2021-04-05 16:37:19

解决方案2
0 2021-04-06 07:35:39

从 R 的特定列中删除单元格中的重复项？

问题描述

2 个解决方案

解决方案1 0 2021-04-05 16:37:19

解决方案2 0 2021-04-06 07:35:39

解决方案1
0 2021-04-05 16:37:19

解决方案2
0 2021-04-06 07:35:39