[英]How to mutate two list columns with dplyr::mutate
I have a following dataframe:我有以下 dataframe:
library(tidyverse)
dat <- structure(list(peptide_name = c(
"foo", "foo", "foo",
"foo", "foo", "foo", "bar", "bar", "bar",
"bar", "bar", "bar"
), predicted = c(
1, 0.965193935171986,
1.002152924502, 1.13340754433401, 1.24280233366, 1.43442435500686,
1, 1.07873571757982, 1.141383975916, 1.247359728244, 1.259245716526,
1.23549751707385
), trueval = c(
1, 1.174927114, 1.279883382, 1.752186589,
1.994169096, 2.358600583, 1, 0.977742448, 1.305246423, 1.500794913,
1.532591415, 1.197138315
)), row.names = c(NA, -12L), class = c(
"tbl_df",
"tbl", "data.frame"
))
dat
It looks like this:它看起来像这样:
peptide_name predicted trueval
<chr> <dbl> <dbl>
1 foo 1 1
2 foo 0.965 1.17
3 foo 1.00 1.28
4 foo 1.13 1.75
5 foo 1.24 1.99
6 foo 1.43 2.36
7 bar 1 1
8 bar 1.08 0.978
9 bar 1.14 1.31
10 bar 1.25 1.50
11 bar 1.26 1.53
12 bar 1.24 1.20
Each foo
and bar
peptide contain the same number of rows.每个
foo
和bar
肽包含相同的行数。 What I want to do is to perform *Pearson correlation` between two peptides.我想要做的是在两个肽之间执行 *Pearson correlation`。
The following code is my attempt:以下代码是我的尝试:
dat %>%
group_by(peptide_name) %>%
# Here create list-columns
nest() %>%
mutate(pn = row_number()) %>%
dplyr::select(pn, everything()) %>%
pivot_wider(-pn, names_from = peptide_name, values_from = data) %>%
# Attempt to calculate Pearson correlation
mutate(pearson = cor(foo, bar, method = "pearson"))
But it failed:但它失败了:
Error in `mutate()`:
! Problem while computing `pearson = cor(foo, bar, method =
"pearson")`.
Caused by error in `cor()`:
! 'x' must be numeric
Whats the right way to do it?正确的做法是什么?
The final expected result of the correlation:关联的最终预期结果:
foo bar type
0.97 0.85 pearson_cor
The problem seems to be in how you are passing the arguments to the cor()
function. I was able to get the following code to work:问题似乎在于您如何将 arguments 传递给
cor()
function。我能够使以下代码正常工作:
dat %>%
group_by(peptide_name) %>%
# Here create list-columns
nest() %>%
mutate(pn = row_number()) %>%
dplyr::select(pn, everything()) %>%
pivot_wider(-pn, names_from = peptide_name, values_from = data) %>%
mutate(pearson_foo = cor(foo[[1]][[1]], foo[[1]][[2]], method = "pearson"),
pearson_bar = cor(bar[[1]][[1]], bar[[1]][[2]], method = "pearson"))
However, I'd be curious to see if anyone has a more elegant solution to your problem, since my solution involves adding an extra column.但是,我很想知道是否有人对您的问题有更优雅的解决方案,因为我的解决方案涉及添加一个额外的列。 I'll keep playing around with it and see if I can come up with something better...
我会继续玩弄它,看看我能不能想出更好的东西......
Edit: Ritchie's answer with summarise()
is way easier!编辑:里奇对
summarise()
的回答要简单得多!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.