[英]R: unite every distinct value in one column into another
I have a data that looks something like this (but actually much larger, around 100000 lines).我有一个看起来像这样的数据(但实际上要大得多,大约 100000 行)。
ID CODE
1 A F1
2 A F2
3 B F3
4 B F1
5 C F1
6 C F1
7 C F2
I need to write all different CODEs for each ID into one column.我需要将每个 ID 的所有不同代码写入一列。 I have gotten half the way by doing:我已经做到了一半:
Data %>% arrange(ID) %>% group_by(ID) %>% distinct(CODE)
CODE ID
<fct> <fct>
1 F1 A
2 F2 A
3 F3 B
4 F1 B
5 F1 C
6 F2 C
But what I need should look something like this (where column all_CODEs holds all codes for each ID written into string):但是我需要的应该是这样的(其中 all_CODEs 列包含写入字符串的每个 ID 的所有代码):
ID all_CODEs
1 A F1 F2
2 B F3 F1
3 C F1 F2
Can anyone help?任何人都可以帮忙吗?
If you are up for a base R solution, Assuming df is your dataframe:如果您想要一个基本的 R 解决方案,假设 df 是您的数据框:
df1 <- df[!duplicated(df),] ## removing duplicates basis df
aggregate( CODE ~ ID, data=df1, paste0, collapse=" ")
Output :输出:
# ID CODE #1 A F1 F2 #2 B F3 F1 #3 C F1 F2
After the distinct
step, we can summarise
by paste
ing the 'CODE' into a single string在distinct
步骤之后,我们可以通过将 'CODE' paste
到单个字符串中来summarise
library(dplyr)
library(stringr)
Data %>%
arrange(ID) %>%
distinct() %>%
group_by(ID) %>%
summarise(all_CODEs = str_c(CODE, collapse=' '))
# A tibble: 3 x 2
# ID all_CODEs
# <chr> <chr>
#1 A F1 F2
#2 B F3 F1
#3 C F1 F2
NOTE: distinct
on a single column with return only that column with the distinct rows because by default .keep_all = FALSE
.注意:单列上的distinct
,仅返回具有不同行的那一列,因为默认情况下.keep_all = FALSE
。 Here, it seems that distinct
should be applied on the two columns在这里,似乎应该在两列上应用distinct
Data <- structure(list(ID = c("A", "A", "B", "B", "C", "C", "C"), CODE = c("F1",
"F2", "F3", "F1", "F1", "F1", "F2")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.