如何使用 dplyr::mutate 改变两个列表列

Question

I have a following dataframe:我有以下 dataframe：

library(tidyverse)
dat <- structure(list(peptide_name = c(
  "foo", "foo", "foo",
  "foo", "foo", "foo", "bar", "bar", "bar",
  "bar", "bar", "bar"
), predicted = c(
  1, 0.965193935171986,
  1.002152924502, 1.13340754433401, 1.24280233366, 1.43442435500686,
  1, 1.07873571757982, 1.141383975916, 1.247359728244, 1.259245716526,
  1.23549751707385
), trueval = c(
  1, 1.174927114, 1.279883382, 1.752186589,
  1.994169096, 2.358600583, 1, 0.977742448, 1.305246423, 1.500794913,
  1.532591415, 1.197138315
)), row.names = c(NA, -12L), class = c(
  "tbl_df",
  "tbl", "data.frame"
))

dat

It looks like this:它看起来像这样：

   peptide_name predicted trueval
   <chr>            <dbl>   <dbl>
 1 foo              1       1    
 2 foo              0.965   1.17 
 3 foo              1.00    1.28 
 4 foo              1.13    1.75 
 5 foo              1.24    1.99 
 6 foo              1.43    2.36 
 7 bar              1       1    
 8 bar              1.08    0.978
 9 bar              1.14    1.31 
10 bar              1.25    1.50 
11 bar              1.26    1.53 
12 bar              1.24    1.20

Each foo and bar peptide contain the same number of rows.每个foo和bar肽包含相同的行数。 What I want to do is to perform *Pearson correlation` between two peptides.我想要做的是在两个肽之间执行 *Pearson correlation`。

The following code is my attempt:以下代码是我的尝试：

dat %>%  
  group_by(peptide_name) %>% 
  # Here create list-columns
  nest() %>% 
  mutate(pn = row_number()) %>% 
  dplyr::select(pn, everything()) %>% 
  pivot_wider(-pn, names_from = peptide_name, values_from = data) %>% 
  # Attempt to calculate Pearson correlation
  mutate(pearson = cor(foo, bar, method = "pearson"))

But it failed:但它失败了：

Error in `mutate()`:
! Problem while computing `pearson = cor(foo, bar, method =
  "pearson")`.
Caused by error in `cor()`:
! 'x' must be numeric

Whats the right way to do it?正确的做法是什么？

The final expected result of the correlation:关联的最终预期结果：

foo   bar  type
0.97 0.85  pearson_cor

Answer 1

The problem seems to be in how you are passing the arguments to the cor() function. I was able to get the following code to work:问题似乎在于您如何将 arguments 传递给cor() function。我能够使以下代码正常工作：

 dat %>%  
  group_by(peptide_name) %>% 
  # Here create list-columns
  nest() %>% 
  mutate(pn = row_number()) %>% 
  dplyr::select(pn, everything()) %>% 
  pivot_wider(-pn, names_from = peptide_name, values_from = data) %>% 
  mutate(pearson_foo = cor(foo[[1]][[1]], foo[[1]][[2]], method = "pearson"),
         pearson_bar = cor(bar[[1]][[1]], bar[[1]][[2]], method = "pearson"))

However, I'd be curious to see if anyone has a more elegant solution to your problem, since my solution involves adding an extra column.但是，我很想知道是否有人对您的问题有更优雅的解决方案，因为我的解决方案涉及添加一个额外的列。 I'll keep playing around with it and see if I can come up with something better...我会继续玩弄它，看看我能不能想出更好的东西......

Edit: Ritchie's answer with summarise() is way easier!编辑：里奇对summarise()的回答要简单得多！

如何使用 dplyr::mutate 改变两个列表列

问题描述

1 个解决方案

解决方案1
0 2022-04-14 03:08:23

如何使用 dplyr::mutate 改变两个列表列

问题描述

1 个解决方案

解决方案1 0 2022-04-14 03:08:23

解决方案1
0 2022-04-14 03:08:23