[英]How to keep columns after `summarise` operation in `dplyr`
我有這種類型的數據:
df <- data.frame(name = c("Acer laurinum", "Acer laurinum Hassk.", "Acmella paniculata",
"Adinandra cf. integerrima", "Adinandra cf. integerrima T.Anderson"),
value1 = c(1,2,3,4,5),
value2 = c(2,3,4,5,6))
我想根據列name
的匹配部分summarise
列value1
和value2
並保留新列author
的唯一值。 這段代碼只做了總結部分,但author
不見了:
df %>%
mutate(author = str_extract(name, "(?<=\\s)(?=.*\\.)[.\\w]+$"),
name1 = trimws(str_remove(name, "(?<=\\s)(?=.*\\.)[.\\w]+$"))) %>%
group_by(name1) %>%
summarise(across(c(value1, value2), sum))
# A tibble: 3 x 3
name1 value1 value2
* <chr> <dbl> <dbl>
1 Acer laurinum 3 5
2 Acmella paniculata 3 4
3 Adinandra cf. integerrima 9 11
預期輸出:
# A tibble: 3 x 3
name1 value1 value2 author
* <chr> <dbl> <dbl> <chr>
1 Acer laurinum 3 5 Hassk.
2 Acmella paniculata 3 4 <NA>
3 Adinandra cf. integerrima 9 11 T.Anderson
您可以使用na.omit(author)[1]
獲取組中author
第一個非 NA 值。
library(dplyr)
library(stringr)
df %>%
mutate(author = str_extract(name, "(?<=\\s)(?=.*\\.)[.\\w]+$"),
name1 = trimws(str_remove(name, "(?<=\\s)(?=.*\\.)[.\\w]+$"))) %>%
group_by(name1) %>%
summarise(across(c(value1, value2), sum),
author = na.omit(author)[1])
# name1 value1 value2 author
# <chr> <dbl> <dbl> <chr>
#1 Acer laurinum 3 5 Hassk.
#2 Acmella paniculata 3 4 NA
#3 Adinandra cf. integerrima 9 11 T.Anderson
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.