如何合并除一个列外所有列中具有相同信息的行？

Question

I have a large data frame that looks smth like this: 我有一个看起来像这样的大数据框：

A  1  2  3  4  ...
B  1  2  3  4  ...
C  1  2  3  4  ...
D  5  2  1  4  ...
E  3  2  3  9  ...
F  0  0  2  2  ...
G  0  0  2  2  ...

As you can see some rows are duplicate entries if you disregard the first column for a second. 如您所见，如果您忽略第一列，则某些行是重复条目。 I would like to combine/merge these rows to generate something like this: 我想合并/合并这些行以生成如下内容：

A;B;C  1  2  3  4  ...
D      5  2  1  4  ...
E      3  2  3  9  ...
F;G    0  0  2  2  ...

I could write a for loop, which iterates over the rows, but that would be neither pretty, nor efficient. 我可以编写一个for循环，该循环遍历所有行，但这既不美观也不有效。 I am pretty certain there's a better way to do this. 我敢肯定，有更好的方法可以做到这一点。

I thought I could: 我以为可以：

slice the df so I have all columns except the first slice <- df[, 2:ncols(df)] 对df进行切片，因此我拥有除第一个slice <- df[, 2:ncols(df)]以外的所有列
get a dataframe with all "duplicate" rows by dups <- df[duplicated(slice)] 通过dups <- df[duplicated(slice)]获取具有所有“重复”行的数据帧
get another dataframe with the "unique" rows by uniq <- df[unique(slice)] 通过uniq <- df[unique(slice)]获得带有“唯一”行的另一个数据框
merge them using all but the first column merge(uniq, dups, by... ) 使用除第一列以外的所有内容合并它们merge(uniq, dups, by... )

Except that won't work since unique doesn't return indices but a whole dataframe, which means I cannot index df with corresponding rows from slice . 除此之外这是行不通的，因为unique不会返回索引，而是返回整个数据帧，这意味着我无法使用slice相应行对df进行索引。

Any suggestions? 有什么建议么？

EDIT: I should clarify that A,B,C... are not rownames but actually part of the dataframe, entries given in string/character representation 编辑：我应该澄清，A，B，C ...不是行名，而是实际上是数据框的一部分，以字符串/字符表示形式给出的条目

Answer 1

There are several functions that would do this. 有几个功能可以做到这一点。 All of them are the common aggregation functions: aggregate , tapply , by , ..., and, of course, the popular "data.table" and "dplyr" set of functions. 它们都是通用的聚合函数： aggregate ， tapply ， by ，...，当然还有流行的“ data.table”和“ dplyr”函数集。

Here's aggregate : 这是aggregate ：

aggregate(V1 ~ ., mydf, toString)
#   V2 V3 V4 V5  V6      V1
# 1  0  0  2  2 ...    F, G
# 2  5  2  1  4 ...       D
# 3  1  2  3  4 ... A, B, C
# 4  3  2  3  9 ...       E

Other options (as indicated in the opening paragraph): 其他选择（如开篇所述）：

library(data.table)
as.data.table(mydf)[, toString(V1), by = eval(setdiff(names(mydf), "V1"))]

library(dplyr)
mydf %>%
  group_by(V2, V3, V4, V5, V6) %>%
  summarise(V1 = toString(V1))

Instead of toString , you can use the classic paste(., collapse = ";") approach which gives you more flexibility about the final output. 可以使用经典的paste(., collapse = ";")方法代替toString ，它为最终输出提供了更大的灵活性。

如何合并除一个列外所有列中具有相同信息的行？

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-03-03 11:08:58

如何合并除一个列外所有列中具有相同信息的行？

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-03-03 11:08:58

解决方案1
2 已采纳 2015-03-03 11:08:58