I have a large data frame that looks smth like this:
A 1 2 3 4 ...
B 1 2 3 4 ...
C 1 2 3 4 ...
D 5 2 1 4 ...
E 3 2 3 9 ...
F 0 0 2 2 ...
G 0 0 2 2 ...
As you can see some rows are duplicate entries if you disregard the first column for a second. I would like to combine/merge these rows to generate something like this:
A;B;C 1 2 3 4 ...
D 5 2 1 4 ...
E 3 2 3 9 ...
F;G 0 0 2 2 ...
I could write a for loop, which iterates over the rows, but that would be neither pretty, nor efficient. I am pretty certain there's a better way to do this.
I thought I could:
slice <- df[, 2:ncols(df)]
dups <- df[duplicated(slice)]
uniq <- df[unique(slice)]
merge(uniq, dups, by... )
Except that won't work since unique doesn't return indices but a whole dataframe, which means I cannot index df
with corresponding rows from slice
.
Any suggestions?
EDIT: I should clarify that A,B,C... are not rownames but actually part of the dataframe, entries given in string/character representation
There are several functions that would do this. All of them are the common aggregation functions: aggregate
, tapply
, by
, ..., and, of course, the popular "data.table" and "dplyr" set of functions.
Here's aggregate
:
aggregate(V1 ~ ., mydf, toString)
# V2 V3 V4 V5 V6 V1
# 1 0 0 2 2 ... F, G
# 2 5 2 1 4 ... D
# 3 1 2 3 4 ... A, B, C
# 4 3 2 3 9 ... E
Other options (as indicated in the opening paragraph):
library(data.table)
as.data.table(mydf)[, toString(V1), by = eval(setdiff(names(mydf), "V1"))]
library(dplyr)
mydf %>%
group_by(V2, V3, V4, V5, V6) %>%
summarise(V1 = toString(V1))
Instead of toString
, you can use the classic paste(., collapse = ";")
approach which gives you more flexibility about the final output.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.