用列折叠行数据（字符、数字、因子等）

Question

I am trying to collapse this data but I am having trouble.我试图折叠这些数据，但我遇到了麻烦。 The dataset is huge - more than 100 columns, and over 1,000 rows.数据集非常庞大——超过 100 列，超过 1,000 行。

This is an example of the dataset of how it looks like:这是其外观的数据集示例：

https://i.stack.imgur.com/8iZq7.png https://i.stack.imgur.com/8iZq7.png

I need to be able to collapse the rows together.我需要能够将行折叠在一起。 I cannot add the values inside Lab together because it'll be greater than 1.我无法将 Lab 中的值相加，因为它会大于 1。

I have tried multiple codes, and it doesn't work because it doesn't take into consideration that I have character, numeric, and timestamps in my dataframe.我尝试了多个代码，但它不起作用，因为它没有考虑到我的 dataframe 中有字符、数字和时间戳。

These are the codes that I have tried with the errors:这些是我尝试过的错误代码：

COLLAPSE6 <- setDT(TRIALBJH4)[, lapply(.SD, function(x)
                      {x <- unique(x[!is.na(x)])
                       if(length(x) == 1) as.character(x)
                       else if(length(x) == 0) NA_character_
                       else collapse=","}),
             by=ID]

This just added a comma into the columns (considered as multiple) when I need it to either say 0, 1, or NA当我需要它说 0、1 或 NA 时，这只是在列中添加了一个逗号（被认为是多个）

COLLAPSE3 %>%
  group_by(ID) %>%
  summarise_all(funs(list(na.omit)))

This just replaced the other columns not listed in the group_by with funs(list(na.omit) - it even replaced the values with it这只是用 funs(list(na.omit) 替换了 group_by 中未列出的其他列 - 它甚至用它替换了值

bjh_sti_merge1 <- bjh_sti_merg6 %>% group_by (ID) %>%
  summarise_each(funs(max(., na.rm = TRUE)))

This doesn't work - it freezes R for me, and I always have to force quit it这不起作用 - 它为我冻结了 R，我总是不得不强制退出它

bjh_sti_merg10 <- bjh_sti_merg6 %>% group_by (ID) %>%
  summarise(AGE = max(AGE, na.rm=TRUE),
            LAB1 = max(LAB1, na.rm=TRUE),
            LAB3 = max(LAB3, na.rm=TRUE))

This one doesn't work - it just takes the first row of the duplicated ones (I can't use this because sometimes the first row is NA, and the third row could have 1 in the column) - Also, this seems to freeze R when I have more than 20 columns in it这个不起作用 - 它只占用重复行的第一行（我不能使用它，因为有时第一行是 NA，第三行可能在列中有 1） - 而且，这似乎冻结R 当我有超过 20 列时

xx <-function(x) x[!is.na(x)]

bjh_sti_merg7 %>% 
  group_by(EPIC_MRN) %>%
  summarise_all(funs(xx))

This doesn't work: it says: Error: Problem with 'summarise()' input 'LAB1'.这不起作用：它说：错误：'summarise（）'输入'LAB1'有问题。 x Input 'LAB1' must be size 0 or 1, not 2. x 输入“LAB1”的大小必须为 0 或 1，而不是 2。

I want the end result to have 1 row per ID.我希望最终结果每个 ID 有 1 行。 The code needs to work for all columns (character, numeric, timestamps, factors, etc.).该代码需要适用于所有列（字符、数字、时间戳、因子等）。 and something that doesn't freeze RStudio for me.以及对我来说不会冻结 RStudio 的东西。 I was always recommended summarise_each, but that kept freezing my laptop (I tried to let it run, it ran for over 2 hours and nothing) and yes, I have uploaded tidyverse, data.table, and dplyr我总是被推荐 summarise_each，但它一直冻结我的笔记本电脑（我试图让它运行，它运行了 2 多个小时，但什么也没有），是的，我已经上传了 tidyverse、data.table 和 dplyr

This also needs to accept NA as well!这也需要接受 NA ！

I would like the dataset to look like: https://i.stack.imgur.com/yBehQ.png我希望数据集看起来像： https://i.stack.imgur.com/yBehQ.png

Answer 1

See if this doesn't work, might take some time to run:看看这是否不起作用，可能需要一些时间才能运行：

plyr::ddply(df, plyr::.(ID), function(x){
  res <- x[1,]
  if(ncol(x) == 1) return(res)
  for (i in 1:ncol(x)) {
    if(class(x[,i]) != "numeric") next()
    res[,i] <- max(x[,i], na.rm=T)
  }
  return(res)
})

Answer 2

This task should be straightforward.这个任务应该很简单。 It is not clear to me though how you wish to summarize the AGE, TIME, LAB1 and LAB2 columns.我不清楚您希望如何总结 AGE、TIME、LAB1 和 LAB2 列。 For simplicity sake I have used max(col, na.rm = TRUE) .为简单起见，我使用了max(col, na.rm = TRUE) 。

library(dplyr)
library(tibble)

data <- tibble(
  ID = c(1, 1, 1, 2, 2, 3, 4, 5, 5, 6, 6, 7),
  SEX = c("M", "M", "M", "F", "F", "M", "M", "F", "F", "M", "M", "F"),
  AGE = c(30, 30, 30, 22, 22, 55, 90, 87, 87, 23, 23, 45),
  TIME = as.POSIXct(rep("02/19/2019 12:00", 12), format = "%m/%d/%Y %H:%M", tz = ""),
  LAB1 = c(0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0),
  LAB2 = c(1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1)
)

data <- data %>%
  group_by(ID, SEX) %>%
  summarize(AGE = max(AGE, na.rm = TRUE),
            TIME = max(TIME, na.rm = TRUE),
            LAB1 = max(LAB1, na.rm = TRUE),
            LAB2 = max(LAB2, na.rm = TRUE))

With this result:有了这个结果：

> data
# A tibble: 7 x 6
# Groups:   ID [7]
     ID SEX     AGE TIME                 LAB1  LAB2
  <dbl> <chr> <dbl> <dttm>              <dbl> <dbl>
1     1 M        30 2019-02-19 12:00:00     1     1
2     2 F        22 2019-02-19 12:00:00     1     1
3     3 M        55 2019-02-19 12:00:00     1     1
4     4 M        90 2019-02-19 12:00:00     1     1
5     5 F        87 2019-02-19 12:00:00     0     0
6     6 M        23 2019-02-19 12:00:00     1     1
7     7 F        45 2019-02-19 12:00:00     0     1

用列折叠行数据（字符、数字、因子等）

问题描述

2 个解决方案

解决方案1
0 2020-08-10 08:19:03

解决方案2
0 2020-08-10 08:22:26

用列折叠行数据（字符、数字、因子等）

问题描述

2 个解决方案

解决方案1 0 2020-08-10 08:19:03

解决方案2 0 2020-08-10 08:22:26

解决方案1
0 2020-08-10 08:19:03

解决方案2
0 2020-08-10 08:22:26