简体   繁体   English

如何合并除一个列外所有列中具有相同信息的行?

[英]How to merge rows that have the same information in all columns except one?

I have a large data frame that looks smth like this: 我有一个看起来像这样的大数据框:

A  1  2  3  4  ...
B  1  2  3  4  ...
C  1  2  3  4  ...
D  5  2  1  4  ...
E  3  2  3  9  ...
F  0  0  2  2  ...
G  0  0  2  2  ...

As you can see some rows are duplicate entries if you disregard the first column for a second. 如您所见,如果您忽略第一列,则某些行是重复条目。 I would like to combine/merge these rows to generate something like this: 我想合并/合并这些行以生成如下内容:

A;B;C  1  2  3  4  ...
D      5  2  1  4  ...
E      3  2  3  9  ...
F;G    0  0  2  2  ...

I could write a for loop, which iterates over the rows, but that would be neither pretty, nor efficient. 我可以编写一个for循环,该循环遍历所有行,但这既不美观也不有效。 I am pretty certain there's a better way to do this. 我敢肯定,有更好的方法可以做到这一点。

I thought I could: 我以为可以:

  1. slice the df so I have all columns except the first slice <- df[, 2:ncols(df)] 对df进行切片,因此我拥有除第一个slice <- df[, 2:ncols(df)]以外的所有列
  2. get a dataframe with all "duplicate" rows by dups <- df[duplicated(slice)] 通过dups <- df[duplicated(slice)]获取具有所有“重复”行的数据帧
  3. get another dataframe with the "unique" rows by uniq <- df[unique(slice)] 通过uniq <- df[unique(slice)]获得带有“唯一”行的另一个数据框
  4. merge them using all but the first column merge(uniq, dups, by... ) 使用除第一列以外的所有内容合并它们merge(uniq, dups, by... )

Except that won't work since unique doesn't return indices but a whole dataframe, which means I cannot index df with corresponding rows from slice . 除此之外这是行不通的,因为unique不会返回索引,而是返回整个数据帧,这意味着我无法使用slice相应行对df进行索引。

Any suggestions? 有什么建议么?

EDIT: I should clarify that A,B,C... are not rownames but actually part of the dataframe, entries given in string/character representation 编辑:我应该澄清,A,B,C ...不是行名,而是实际上是数据框的一部分,以字符串/字符表示形式给出的条目

There are several functions that would do this. 有几个功能可以做到这一点。 All of them are the common aggregation functions: aggregate , tapply , by , ..., and, of course, the popular "data.table" and "dplyr" set of functions. 它们都是通用的聚合函数: aggregatetapplyby ,...,当然还有流行的“ data.table”和“ dplyr”函数集。

Here's aggregate : 这是aggregate

aggregate(V1 ~ ., mydf, toString)
#   V2 V3 V4 V5  V6      V1
# 1  0  0  2  2 ...    F, G
# 2  5  2  1  4 ...       D
# 3  1  2  3  4 ... A, B, C
# 4  3  2  3  9 ...       E

Other options (as indicated in the opening paragraph): 其他选择(如开篇所述):

library(data.table)
as.data.table(mydf)[, toString(V1), by = eval(setdiff(names(mydf), "V1"))]

library(dplyr)
mydf %>%
  group_by(V2, V3, V4, V5, V6) %>%
  summarise(V1 = toString(V1))

Instead of toString , you can use the classic paste(., collapse = ";") approach which gives you more flexibility about the final output. 可以使用经典的paste(., collapse = ";")方法代替toString ,它为最终输出提供了更大的灵活性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 删除除一列之外的所有列都具有 NA 值的行? - Remove rows where all columns except one have NA values? dplyr 合并行并合并除一个之外的所有列匹配的列 - dplyr merge rows and combine column where all columns except one match 如何提取一列中与其中一个行具有相同值的所有行? - How to extract all rows that have the same value in a column as one of the rows? 如何过滤出所有列具有相同编号的行? - How to filter out rows that have same number for all columns? 如何获取除R中的一行之外的所有行 - How to get all the rows except one in R 如何合并 r 中两个数据集的两列,并包括一个数据帧中的所有元素,除非它们是 NA? - How do I merge two columns from two datasets in r and include all the elements from one data frame except when they are NA? 如何将具有相同rowname的行合并为一个 - How to merge rows with the same rowname into one 如何将除一个列之外的所有列作为参数传递给setkey()? - How to pass all columns except one as argument to setkey()? 如何删除 dataframe 的所有行,这些行在 R 的列子集中具有相同的字符串值? - How can I remove all rows of a dataframe that have the same string value across a subset of columns in R? 将行合并为一行并合并信息 - Combine rows into one row and merge information
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM