如何使用R中的自定义函数聚合data.frame中的多个列？

Question

I've got a data.frame dt with some duplicate keys and missing data, ie 我有一个带有一些重复键和缺少数据的data.frame dt ，即

Name     Height     Weight   Age
Alice    180        NA       35
Bob      NA         80       27
Alice    NA         70       NA
Charles  170        75       NA

In this case the key is the name, and I would like to apply to each column a function like 在这种情况下，键是名称，我想在每列中应用一个函数

f <- function(x){
  x <- x[!is.na(x)]
  x <- x[1]
  return(x)
  }

while aggregating by the key (ie, the "Name" column), so as to obtain as a result 在通过密钥（即“名称”列）聚合时，以便获得结果

Name     Height     Weight   Age
Alice    180        70       35
Bob      NA         80       27
Charles  170        75       NA

I tried 我试过了

dt_agg <- aggregate(. ~ Name,
                    data = dt,
                    FUN = f)

and I got some errors, then I tried the following 我有一些错误，然后我尝试了以下

dt_agg_1 <- aggregate(Height ~ Name,
                      data = dt,
                      FUN = f)

dt_agg_2 <- aggregate(Weight ~ Name,
                      data = dt,
                      FUN = f)

and this time it worked. 这次它奏效了。

Since I have 50 columns, this second approach is quite cumbersome for me. 由于我有50列，第二种方法对我来说非常麻烦。 Is there a way to fix the first approach? 有没有办法解决第一种方法？

Thanks for help! 感谢帮助！

Answer 1

You were very close with the aggregate function, you needed to adjust how aggregate handles NA (from na.omit to na.pass ). 你非常接近aggregate函数，你需要调整聚合处理NA （从na.omit到na.pass ）。 My guess is that aggregate removes all rows with NA first and then does its aggregating, instead of removing NAs as aggregate iterates over the columns to be aggregated. 我的猜测是聚合首先删除NA的所有行，然后进行聚合，而不是删除NAs，因为聚合迭代要聚合的列。 Since your example dataframe you have an NA in each row you end up with a 0-row dataframe (which is the error I was getting when running your code). 由于您的示例数据帧在每行中都有一个NA ，因此最终会得到一个0行数据帧（这是我在运行代码时遇到的错误）。 I tested this by removing all but one NA and your code works as-is. 我通过删除除了一个NA以外的所有NA来测试它，并且您的代码按原样运行。 So we set na.action = na.pass to pass the NA's through. 所以我们设置na.action = na.pass来传递NA。

dt_agg <- aggregate(. ~ Name,
                    data = dt,
                    FUN = f, na.action = "na.pass")

original answer 原始答案

dt_agg <- aggregate(dt[, -1], 
                    by = list(dt$Name),
                    FUN = f)
dt_agg
# Group.1 Height Weight Age
# 1   Alice    180     70  35
# 2     Bob     NA     80  27
# 3 Charles    170     75  NA

Answer 2

You can do this with dplyr : 您可以使用dplyr执行此dplyr ：

library(dplyr)
df %>%
  group_by(Name) %>%
  summarize_all(funs(sort(.)[1]))

Result: 结果：

# A tibble: 3 x 4
     Name Height Weight   Age
   <fctr>  <int>  <int> <int>
1   Alice    180     70    35
2     Bob     NA     80    27
3 Charles    170     75    NA

Data: 数据：

df = read.table(text = "Name     Height     Weight   Age
Alice    180        NA       35
Bob      NA         80       27
Alice    NA         70       NA
Charles  170        75       NA", header = TRUE)

Answer 3

Here is an option with data.table 这是data.table一个选项

library(data.table)
setDT(df)[, lapply(.SD, function(x) head(sort(x), 1)), Name]
#      Name Height Weight Age
#1:   Alice    180     70  35
#2:     Bob     NA     80  27
#3: Charles    170     75  NA

Answer 4

Simply, add na.action=na.pass in aggregate() call: 只需在aggregate()调用中添加na.action=na.pass ：

aggdf <- aggregate(.~Name, data=df, FUN=f, na.action=na.pass)
#      Name Height Weight Age
# 1   Alice    180     70  35
# 2     Bob     NA     80  27
# 3 Charles    170     75  NA

Answer 5

If you add an ifelse() to your function to make sure the function returns a value if all values are NA : 如果向函数添加ifelse()以确保函数在所有值都为NA返回值：

f <- function(x) {
  x <- x[!is.na(x)]
  ifelse(length(x) == 0, NA, x)
}

You can use dplyr to aggregate: 您可以使用dplyr进行聚合：

library(dplyr)
dt %>% group_by(Name) %>% summarise_all(funs(f))

This returns: 返回：

# A tibble: 3 x 4
     Name Height Weight   Age
   <fctr>  <dbl>  <dbl> <dbl>
1   Alice    180     70    35
2     Bob     NA     80    27
3 Charles    170     75    NA

如何使用R中的自定义函数聚合data.frame中的多个列？

问题描述

5 个解决方案

解决方案1
3 2017-10-10 13:43:04

original answer 原始答案

解决方案2
2 已采纳 2017-10-10 13:37:31

解决方案3
2 2017-10-10 13:51:24

解决方案4
2 2017-10-10 13:59:32

解决方案5
1 2017-10-10 13:46:30

如何使用R中的自定义函数聚合data.frame中的多个列？

问题描述

5 个解决方案

解决方案1 3 2017-10-10 13:43:04

original answer 原始答案

解决方案2 2 已采纳 2017-10-10 13:37:31

解决方案3 2 2017-10-10 13:51:24

解决方案4 2 2017-10-10 13:59:32

解决方案5 1 2017-10-10 13:46:30

解决方案1
3 2017-10-10 13:43:04

解决方案2
2 已采纳 2017-10-10 13:37:31

解决方案3
2 2017-10-10 13:51:24

解决方案4
2 2017-10-10 13:59:32

解决方案5
1 2017-10-10 13:46:30