[英]Merging dataframe rows with text, numeric, and NA values
I have a dataframe like the one below (though much larger).我有一个 dataframe,如下图所示(虽然大得多)。
name名称 | age年龄 | sex性别 | favcolor最喜欢的颜色 | grade年级 | score分数 |
---|---|---|---|---|---|
tim蒂姆 | NA北美 | NA北美 | blue蓝色的 | 12 12 | 100 100 |
tim蒂姆 | 18 18 | male男性 | red红色的 | 12 12 | 50 50 |
dave戴夫 | 17 17 | male男性 | red红色的 | 12 12 | 85 85 |
mike麦克风 | 15 15 | male男性 | green绿色 | 10 10 | 95 95 |
john约翰 | 12 12 | male男性 | NA北美 | 7 7 | 80 80 |
john约翰 | 12 12 | NA北美 | orange橘子 | 7 7 | 90 90后 |
There are a few things I want to do.有几件事我想做。 My primary goal is to merge the rows by the name variable, such that each name gets one row.我的主要目标是通过 name 变量合并行,这样每个名字都有一行。 Second, when merging rows, I want cells with data to override cells that are NA (tim with age
and sex
. Third, if the rows that are merging both have text values (eg tim with favcolor
), I want to keep the first one. And lastly, for rows that both have values in numeric columns ( age
, grade
, and score
), I want the new value to be the mean of the merging rows.其次,在合并行时,我希望包含数据的单元格覆盖 NA 的单元格(tim with age
和sex
。第三,如果合并的行都有文本值(例如 tim with favcolor
),我想保留第一个. 最后,对于在数字列( age
、 grade
和score
)中都有值的行,我希望新值是合并行的平均值。
If all these rules are followed, the dataframe should look something like this.如果遵循所有这些规则,dataframe 应该看起来像这样。
name名称 | age年龄 | sex性别 | favcolor最喜欢的颜色 | grade年级 | score分数 |
---|---|---|---|---|---|
tim蒂姆 | 18 18 | male男性 | blue蓝色的 | 12 12 | 75 75 |
dave戴夫 | 17 17 | male男性 | red红色的 | 12 12 | 85 85 |
mike麦克风 | 15 15 | male男性 | green绿色 | 10 10 | 95 95 |
john约翰 | 12 12 | male男性 | orange橘子 | 7 7 | 85 85 |
Is there a straightforward way to accomplish this?有没有一种直接的方法可以做到这一点? I've tried about 30 different things, but it never turns out the way I want it to.我已经尝试了大约 30 种不同的东西,但结果从来没有像我想要的那样。 Any help would be greatly appreciated.任何帮助将不胜感激。
You can group_by(name)
and use summarize
to collapse the rows into a single one.您可以group_by(name)
并使用summarize
将行折叠成一个行。
Finally, relocate
to reorder the columns as the input.最后, relocate
以将列重新排序为输入。
library(dplyr)
df %>%
group_by(name) %>%
summarize(across(where(is.numeric), ~ mean(.x, na.rm = T)),
across(where(is.character), ~.x[!is.na(.x)][1])) %>%
relocate(colnames(df))
# A tibble: 4 × 6
name age sex favcolor grade score
<chr> <dbl> <chr> <chr> <dbl> <dbl>
1 dave 17 male red 12 85
2 john 12 male orange 7 85
3 mike 15 male green 10 95
4 tim 18 male blue 12 75
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.