简体   繁体   English

将 dataframe 行与文本、数字和 NA 值合并

[英]Merging dataframe rows with text, numeric, and NA values

I have a dataframe like the one below (though much larger).我有一个 dataframe,如下图所示(虽然大得多)。

name名称 age年龄 sex性别 favcolor最喜欢的颜色 grade年级 score分数
tim蒂姆 NA北美 NA北美 blue蓝色的 12 12 100 100
tim蒂姆 18 18 male男性 red红色的 12 12 50 50
dave戴夫 17 17 male男性 red红色的 12 12 85 85
mike麦克风 15 15 male男性 green绿色 10 10 95 95
john约翰 12 12 male男性 NA北美 7 7 80 80
john约翰 12 12 NA北美 orange橘子 7 7 90 90后

There are a few things I want to do.有几件事我想做。 My primary goal is to merge the rows by the name variable, such that each name gets one row.我的主要目标是通过 name 变量合并行,这样每个名字都有一行。 Second, when merging rows, I want cells with data to override cells that are NA (tim with age and sex . Third, if the rows that are merging both have text values (eg tim with favcolor ), I want to keep the first one. And lastly, for rows that both have values in numeric columns ( age , grade , and score ), I want the new value to be the mean of the merging rows.其次,在合并行时,我希望包含数据的单元格覆盖 NA 的单元格(tim with agesex 。第三,如果合并的行都有文本值(例如 tim with favcolor ),我想保留第一个. 最后,对于在数字列( agegradescore )中都有值的行,我希望新值是合并行的平均值。

If all these rules are followed, the dataframe should look something like this.如果遵循所有这些规则,dataframe 应该看起来像这样。

name名称 age年龄 sex性别 favcolor最喜欢的颜色 grade年级 score分数
tim蒂姆 18 18 male男性 blue蓝色的 12 12 75 75
dave戴夫 17 17 male男性 red红色的 12 12 85 85
mike麦克风 15 15 male男性 green绿色 10 10 95 95
john约翰 12 12 male男性 orange橘子 7 7 85 85

Is there a straightforward way to accomplish this?有没有一种直接的方法可以做到这一点? I've tried about 30 different things, but it never turns out the way I want it to.我已经尝试了大约 30 种不同的东西,但结果从来没有像我想要的那样。 Any help would be greatly appreciated.任何帮助将不胜感激。

You can group_by(name) and use summarize to collapse the rows into a single one.您可以group_by(name)并使用summarize将行折叠成一个行。

Finally, relocate to reorder the columns as the input.最后, relocate以将列重新排序为输入。

library(dplyr)

df %>% 
  group_by(name) %>% 
  summarize(across(where(is.numeric), ~ mean(.x, na.rm = T)),
            across(where(is.character), ~.x[!is.na(.x)][1])) %>% 
  relocate(colnames(df))

# A tibble: 4 × 6
  name    age sex   favcolor grade score
  <chr> <dbl> <chr> <chr>    <dbl> <dbl>
1 dave     17 male  red         12    85
2 john     12 male  orange       7    85
3 mike     15 male  green       10    95
4 tim      18 male  blue        12    75

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM