将 dataframe 行与文本、数字和 NA 值合并

Question

I have a dataframe like the one below (though much larger).我有一个 dataframe，如下图所示（虽然大得多）。

name名称	age年龄	sex性别	favcolor最喜欢的颜色	grade年级	score分数
tim蒂姆	NA北美	NA北美	blue蓝色的	12 12	100 100
tim蒂姆	18 18	male男性	red红色的	12 12	50 50
dave戴夫	17 17	male男性	red红色的	12 12	85 85
mike麦克风	15 15	male男性	green绿色	10 10	95 95
john约翰	12 12	male男性	NA北美	7 7	80 80
john约翰	12 12	NA北美	orange橘子	7 7	90 90后

There are a few things I want to do.有几件事我想做。 My primary goal is to merge the rows by the name variable, such that each name gets one row.我的主要目标是通过 name 变量合并行，这样每个名字都有一行。 Second, when merging rows, I want cells with data to override cells that are NA (tim with age and sex . Third, if the rows that are merging both have text values (eg tim with favcolor ), I want to keep the first one. And lastly, for rows that both have values in numeric columns ( age , grade , and score ), I want the new value to be the mean of the merging rows.其次，在合并行时，我希望包含数据的单元格覆盖 NA 的单元格（tim with age和sex 。第三，如果合并的行都有文本值（例如 tim with favcolor ），我想保留第一个. 最后，对于在数字列（ age 、 grade和score ）中都有值的行，我希望新值是合并行的平均值。

If all these rules are followed, the dataframe should look something like this.如果遵循所有这些规则，dataframe 应该看起来像这样。

name名称	age年龄	sex性别	favcolor最喜欢的颜色	grade年级	score分数
tim蒂姆	18 18	male男性	blue蓝色的	12 12	75 75
dave戴夫	17 17	male男性	red红色的	12 12	85 85
mike麦克风	15 15	male男性	green绿色	10 10	95 95
john约翰	12 12	male男性	orange橘子	7 7	85 85

Is there a straightforward way to accomplish this?有没有一种直接的方法可以做到这一点？ I've tried about 30 different things, but it never turns out the way I want it to.我已经尝试了大约 30 种不同的东西，但结果从来没有像我想要的那样。 Any help would be greatly appreciated.任何帮助将不胜感激。

Answer 1

You can group_by(name) and use summarize to collapse the rows into a single one.您可以group_by(name)并使用summarize将行折叠成一个行。

Finally, relocate to reorder the columns as the input.最后， relocate以将列重新排序为输入。

library(dplyr)

df %>% 
  group_by(name) %>% 
  summarize(across(where(is.numeric), ~ mean(.x, na.rm = T)),
            across(where(is.character), ~.x[!is.na(.x)][1])) %>% 
  relocate(colnames(df))

# A tibble: 4 × 6
  name    age sex   favcolor grade score
  <chr> <dbl> <chr> <chr>    <dbl> <dbl>
1 dave     17 male  red         12    85
2 john     12 male  orange       7    85
3 mike     15 male  green       10    95
4 tim      18 male  blue        12    75

将 dataframe 行与文本、数字和 NA 值合并

问题描述

1 个解决方案

解决方案1
5 已采纳 2022-04-04 13:51:13

将 dataframe 行与文本、数字和 NA 值合并

问题描述

1 个解决方案

解决方案1 5 已采纳 2022-04-04 13:51:13

解决方案1
5 已采纳 2022-04-04 13:51:13