[英]How to mutate two list columns with dplyr::mutate
我有以下 dataframe:
library(tidyverse)
dat <- structure(list(peptide_name = c(
"foo", "foo", "foo",
"foo", "foo", "foo", "bar", "bar", "bar",
"bar", "bar", "bar"
), predicted = c(
1, 0.965193935171986,
1.002152924502, 1.13340754433401, 1.24280233366, 1.43442435500686,
1, 1.07873571757982, 1.141383975916, 1.247359728244, 1.259245716526,
1.23549751707385
), trueval = c(
1, 1.174927114, 1.279883382, 1.752186589,
1.994169096, 2.358600583, 1, 0.977742448, 1.305246423, 1.500794913,
1.532591415, 1.197138315
)), row.names = c(NA, -12L), class = c(
"tbl_df",
"tbl", "data.frame"
))
dat
它看起來像這樣:
peptide_name predicted trueval
<chr> <dbl> <dbl>
1 foo 1 1
2 foo 0.965 1.17
3 foo 1.00 1.28
4 foo 1.13 1.75
5 foo 1.24 1.99
6 foo 1.43 2.36
7 bar 1 1
8 bar 1.08 0.978
9 bar 1.14 1.31
10 bar 1.25 1.50
11 bar 1.26 1.53
12 bar 1.24 1.20
每個foo
和bar
肽包含相同的行數。 我想要做的是在兩個肽之間執行 *Pearson correlation`。
以下代碼是我的嘗試:
dat %>%
group_by(peptide_name) %>%
# Here create list-columns
nest() %>%
mutate(pn = row_number()) %>%
dplyr::select(pn, everything()) %>%
pivot_wider(-pn, names_from = peptide_name, values_from = data) %>%
# Attempt to calculate Pearson correlation
mutate(pearson = cor(foo, bar, method = "pearson"))
但它失敗了:
Error in `mutate()`:
! Problem while computing `pearson = cor(foo, bar, method =
"pearson")`.
Caused by error in `cor()`:
! 'x' must be numeric
正確的做法是什么?
關聯的最終預期結果:
foo bar type
0.97 0.85 pearson_cor
問題似乎在於您如何將 arguments 傳遞給cor()
function。我能夠使以下代碼正常工作:
dat %>%
group_by(peptide_name) %>%
# Here create list-columns
nest() %>%
mutate(pn = row_number()) %>%
dplyr::select(pn, everything()) %>%
pivot_wider(-pn, names_from = peptide_name, values_from = data) %>%
mutate(pearson_foo = cor(foo[[1]][[1]], foo[[1]][[2]], method = "pearson"),
pearson_bar = cor(bar[[1]][[1]], bar[[1]][[2]], method = "pearson"))
但是,我很想知道是否有人對您的問題有更優雅的解決方案,因為我的解決方案涉及添加一個額外的列。 我會繼續玩弄它,看看我能不能想出更好的東西......
編輯:里奇對summarise()
的回答要簡單得多!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.