简体   繁体   English

如何使用 dplyr::mutate 改变两个列表列

[英]How to mutate two list columns with dplyr::mutate

I have a following dataframe:我有以下 dataframe:

library(tidyverse)
dat <- structure(list(peptide_name = c(
  "foo", "foo", "foo",
  "foo", "foo", "foo", "bar", "bar", "bar",
  "bar", "bar", "bar"
), predicted = c(
  1, 0.965193935171986,
  1.002152924502, 1.13340754433401, 1.24280233366, 1.43442435500686,
  1, 1.07873571757982, 1.141383975916, 1.247359728244, 1.259245716526,
  1.23549751707385
), trueval = c(
  1, 1.174927114, 1.279883382, 1.752186589,
  1.994169096, 2.358600583, 1, 0.977742448, 1.305246423, 1.500794913,
  1.532591415, 1.197138315
)), row.names = c(NA, -12L), class = c(
  "tbl_df",
  "tbl", "data.frame"
))

dat

It looks like this:它看起来像这样:

   peptide_name predicted trueval
   <chr>            <dbl>   <dbl>
 1 foo              1       1    
 2 foo              0.965   1.17 
 3 foo              1.00    1.28 
 4 foo              1.13    1.75 
 5 foo              1.24    1.99 
 6 foo              1.43    2.36 
 7 bar              1       1    
 8 bar              1.08    0.978
 9 bar              1.14    1.31 
10 bar              1.25    1.50 
11 bar              1.26    1.53 
12 bar              1.24    1.20 

Each foo and bar peptide contain the same number of rows.每个foobar肽包含相同的行数。 What I want to do is to perform *Pearson correlation` between two peptides.我想要做的是在两个肽之间执行 *Pearson correlation`。

The following code is my attempt:以下代码是我的尝试:

dat %>%  
  group_by(peptide_name) %>% 
  # Here create list-columns
  nest() %>% 
  mutate(pn = row_number()) %>% 
  dplyr::select(pn, everything()) %>% 
  pivot_wider(-pn, names_from = peptide_name, values_from = data) %>% 
  # Attempt to calculate Pearson correlation
  mutate(pearson = cor(foo, bar, method = "pearson")) 

But it failed:但它失败了:

Error in `mutate()`:
! Problem while computing `pearson = cor(foo, bar, method =
  "pearson")`.
Caused by error in `cor()`:
! 'x' must be numeric

Whats the right way to do it?正确的做法是什么?

The final expected result of the correlation:关联的最终预期结果:

foo   bar  type
0.97 0.85  pearson_cor

The problem seems to be in how you are passing the arguments to the cor() function. I was able to get the following code to work:问题似乎在于您如何将 arguments 传递给cor() function。我能够使以下代码正常工作:

 dat %>%  
  group_by(peptide_name) %>% 
  # Here create list-columns
  nest() %>% 
  mutate(pn = row_number()) %>% 
  dplyr::select(pn, everything()) %>% 
  pivot_wider(-pn, names_from = peptide_name, values_from = data) %>% 
  mutate(pearson_foo = cor(foo[[1]][[1]], foo[[1]][[2]], method = "pearson"),
         pearson_bar = cor(bar[[1]][[1]], bar[[1]][[2]], method = "pearson"))

However, I'd be curious to see if anyone has a more elegant solution to your problem, since my solution involves adding an extra column.但是,我很想知道是否有人对您的问题有更优雅的解决方案,因为我的解决方案涉及添加一个额外的列。 I'll keep playing around with it and see if I can come up with something better...我会继续玩弄它,看看我能不能想出更好的东西......

Edit: Ritchie's answer with summarise() is way easier!编辑:里奇对summarise()的回答要简单得多!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM