R 将多行折叠为 1 行 - 相同的列

Question

This is piggy backing on a question I answered last night as I am reconsidering how I'd like to format my data.这是我昨晚回答的一个问题的附带支持，因为我正在重新考虑我想如何格式化我的数据。 I did search but couldn't find up with any applicable answer;我确实进行了搜索，但找不到任何适用的答案； I may be searching with wrong terms.我可能正在用错误的术语进行搜索。

I have a data table with many rows that I'd like to combine:我有一个包含许多行的数据表，我想合并这些行：

record_numb <- c(1,1,1,2,2,2)
col_a <- c(123,'','',987,'','')
col_b <- c('','234','','','765','')
col_c <- c('','','543','','','543')
df <- data.frame(record_numb,col_a,col_b,col_c)
library(data.table)
setDT(df)

record_numb    col_a    col_b     col_c
1               123
1                       234
1                                 345
2               987
2                       765
2                               543

Each row will always have either col_a, col_b, or col_c populated.每行将始终填充 col_a、col_b 或 col_c。 It will never have more than 1 of those 3 populated.它永远不会超过这 3 个中的 1 个。 I'd like to pivot(?) these into a single row per record so it appears like this:我想将（？）这些转换为每条记录的一行，所以它看起来像这样：

record_numb     col_a   col_b   col_c
1               123     234     345
2               987     765     543

I played with melt/cast a bit, but I'm such a novice at R that half of my issue is knowing what is available to use.我玩了一点融化/铸造，但我是 R 的新手，我的一半问题是知道可以使用什么。 There is just so much to use that I'm hoping one of you can point me to a package or function off the top of your head.有很多东西可以使用，我希望你们中的一个人可以指出一个包或功能。 My searches I performed pointed me to melt and cast and such, but I was unable to apply it to this case.我进行的搜索指向我熔化和铸造等，但我无法将其应用于这种情况。 I'm open to using any function or package.我愿意使用任何函数或包。

Answer 1

As you suggested that you would like a data.table solution in your comment, you could use正如您建议您在评论中使用data.table解决方案一样，您可以使用

library(data.table)
df <- data.table(record_numb,col_a,col_b,col_c)

df[, lapply(.SD, paste0, collapse=""), by=record_numb]
   record_numb col_a col_b col_c
1:           1   123   234   543
2:           2   987   765   543

.SD basically says, "take all the variables in my data.table" except those in the by argument. .SD基本上说，“获取我的 data.table 中的所有变量”，除了 by 参数中的变量。 In @Frank's answer, he reduces the set of the variables using .SDcols .在.SDcols的回答中，他使用.SDcols减少了变量.SDcols 。 If you want to cast the variables into numeric, you can still do this in one line.如果您想将变量转换为数字，您仍然可以在一行中执行此操作。 Here is a chaining method.这是一个链接方法。

df[, lapply(.SD, paste0, collapse=""), by=record_numb][, lapply(.SD, as.integer)]

The second "chain" casts all the variables as integers.第二个“链”将所有变量转换为整数。

Answer 2

You can reshape to long format, drop the blank entries and then go back to wide:您可以将形状重新调整为长格式，删除空白条目，然后返回宽格式：

res <- dcast(melt(df, id.vars = "record_numb")[ value != "" ], record_numb ~ variable)

   record_numb col_a col_b col_c
1:           1   123   234   543
2:           2   987   765   543

You may find it more readable at first using magrittr:起初使用 magrittr 时，您可能会发现它更具可读性：

library(magrittr)
res = df %>% 
  melt(id.vars = "record_numb") %>% 
  .[ value != "" ] %>% 
  dcast(record_numb ~ variable)

The numbers are still formatted as strings, but you can convert them with...数字仍然格式化为字符串，但您可以使用...

cols = setdiff(names(res), "record_numb")
res[, (cols) := lapply(.SD, type.convert), .SDcols = cols]

Type conversion will change each column to whatever class it looks like it should be (numeric, integer, whatever).类型转换会将每一列更改为它看起来应该是的任何类（数字、整数等）。 See ?type.convert .见?type.convert 。

Answer 3

Just do this :只需这样做：

df = df %>% group_by(record_numb) %>%
    summarise(col_a = sum(col_a, na.rm = T),
    col_b = sum(col_b, na.rm = T), 
    col_c = sum(col_c, na.rm = T))

.... inplace of 'sum' you could use min, max or whatever. .... 代替“总和”，您可以使用最小值、最大值或其他任何值。

R 将多行折叠为 1 行 - 相同的列

问题描述

3 个解决方案

解决方案1
6 已采纳 2016-12-09 21:22:04

解决方案2
5 2016-12-09 21:22:10

解决方案3
3 2018-10-05 06:03:42

R 将多行折叠为 1 行 - 相同的列

问题描述

3 个解决方案

解决方案1 6 已采纳 2016-12-09 21:22:04

解决方案2 5 2016-12-09 21:22:10

解决方案3 3 2018-10-05 06:03:42

解决方案1
6 已采纳 2016-12-09 21:22:04

解决方案2
5 2016-12-09 21:22:10

解决方案3
3 2018-10-05 06:03:42