简体   繁体   中英

Using aggregate in R to find unique values of one variable for the same value of another variable

I want to apply the aggregate function in this data frame:

A <- data.frame(c(1:2,1:2,2),c("a","c","b","c","d"))
colnames(A) <- c("ola","hi")
A

> A
  ola hi
1   1  a
2   2  c
3   1  b
4   2  c
5   2  d

to get A with ordered values of A$ola and corresponding unique values of A$hi, like this:

A <- data.frame(c(1:2),c("a,b","c,d"))
colnames(A) <- c("ola","hi")
> A
  ola  hi
1   1 a,b
2   2 c,d

I tried this code:

aggregate(A, by=list(A$ola), FUN=unique)

but it gives this as result:

  Group.1 ola hi.1 hi.2
1       1   1    a    b
2       2   2    c    d

Could please someone explain me what am I doing wrong?

In addition to the paste method, if we want 'hi' column as a list ,

r1 <- aggregate(hi~ola, unique(A), FUN=list)
r1
#  ola   hi
#1   1 a, b
#2   2 c, d

The OP's code gives a data.frame

r2 <- aggregate(hi~ola, A, FUN=unique)
r2
#  ola hi.1 hi.2
#1   1    a    b
#2   2    c    d

with two columns, where the second column 'hi' is aa matrix .

str(r2)
#'data.frame':  2 obs. of  2 variables:
#$ ola: int  1 2
#$ hi : chr [1:2, 1:2] "a" "c" "b" "d"

Another option:

library(dplyr)
distinct(A) %>% group_by(ola) %>% summarise(hi = toString(hi))

Which gives:

#Source: local data frame [2 x 2]
#
#    ola    hi
#  (int) (chr)
#1     1  a, b
#2     2     c

Data

A <- structure(list(ola = c(1L, 2L, 1L, 2L), hi = structure(c(1L, 
3L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor")), .Names = c("ola", 
"hi"), row.names = c(NA, -4L), class = "data.frame")

If you really do want a column with the text of the unique items you just need a bit more complex function.

uniqCSV <- function(x) { paste(unique(x), sep = ',') }
aggregate(hi ~ ola, data = A, FUN= uniqCSV)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM