简体   繁体   English

dplyr:我如何根据其他列中的值计算组内的倍数变化

[英]dplyr: How do i calculate fold-change within group based on values in other column

My current data roughly has the following pattern:我目前的数据大致有以下模式:

Tree   Fertilized   Region   Fruits

apple  lightly      sunny    100
apple  lightly      dark     50
apple  heavily      sunny    300
apple  heavily      dark     200
pear   lightly      sunny    150
pear   lightly      dark     200
pear   heavily      sunny    300
pear   heavily      dark     150

Here I want to calculate (as part of a bigger function) the fold-change of placing the tree in a sunny place compared to a dark one within each combination of fertilization amount and type of tree(eg a 2-fold change for lightly fertilized apple trees):在这里,我想计算(作为更大函数的一部分)在施肥量和树木类型的每种组合中,将树放在阳光充足的地方与黑暗的地方相比的倍数变化(例如,轻度施肥的 2 倍变化苹果树):

df%<>%
  group_by(Tree,Fertilized) %>% 
  summarise(!!paste0("fold_change_", quote(Fruits)) := .[Region == "sunny","Fruits"]/.[type == "dark","Fruits"])

However, I get an error saying that the "Fruits" column doesn't exist.但是,我收到一条错误消息,指出“水果”列不存在。 Does anyone have a suggestion on how to get this working?有没有人对如何让它工作有建议? I guess the solution is some minor syntax tweak but I can"t seem to find it myself or online.我想解决方案是一些小的语法调整,但我似乎无法自己或在网上找到它。

The actual dataset has many more tree types and parameters like "Fruits", hence I picked the pipe structure and dynamic labelling of columns (",:paste0()", ".="), which may be relevant or irrelevant for solving this issue.实际数据集有更多的树类型和参数,如“水果”,因此我选择了 pipe 结构和列的动态标签(“,:paste0()”,“.=”),这可能与解决此问题相关或无关问题。

Thanks in advance to anyone trying to help!在此先感谢任何试图提供帮助的人!

Cheers, Rob干杯,罗伯

I would use a group-by operation:我会使用分组操作:

library(data.table)
library(dplyr)


f <- tempfile()
writeLines("
Tree,  Fertilized,  Region,  Fruits,
apple, lightly, sunny, 100,
apple, lightly, dark, 50,
apple, heavily, sunny, 300,
apple, heavily, dark, 200,
pear, lightly, sunny, 150,
pear, lightly, dark, 200,
pear, heavily, sunny, 300,
pear, heavily, dark, 150
", f)
dat <- read.csv(f)

data.table data.table

dat <- data.table(dat)

dat[order(Region), .(fold_change = Fruits[2] / Fruits[1]), by=.(Tree, Fertilized)]
#>     Tree Fertilized fold_change
#> 1: apple    lightly        2.00
#> 2: apple    heavily        1.50
#> 3:  pear    lightly        0.75
#> 4:  pear    heavily        2.00

tidyverse整洁宇宙

dat %>% 
  arrange(Region) %>%
  group_by(Tree, Fertilized)  %>%
        summarize(fold_change = Fruits[2] / Fruits[1])
#> `summarise()` regrouping output by 'Tree' (override with `.groups` argument)
#> # A tibble: 4 x 3
#> # Groups:   Tree [2]
#>   Tree  Fertilized fold_change
#>   <chr> <chr>            <dbl>
#> 1 apple " heavily"        1.5 
#> 2 apple " lightly"        2   
#> 3 pear  " heavily"        2   
#> 4 pear  " lightly"        0.75

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 dplyr:如何计算每组内不同值的频率 - dplyr: How to calculate frequency of different values within each group 我如何 plot log2 跨基因组坐标的倍数变化(使用 Deseq2 output csv) - How can I plot log2 fold-change across genome coordinates (using Deseq2 output csv) 在R中绘制倍数变化比例 - Plotting a fold-change scale in R 如何根据组内的其他实例对实例进行分类? - How do I classify instances based on other instances within a group? 如何使用dplyr有条件地按组更改列中的值? - How to use dplyr to conditionally change values in a column by group? 根据其他2列并通过比较组中的值添加列 - Adding a column based on other 2 columns and by comparing values within the group 如何使用 dplyr 创建基于另一个值的列,而不必写下每个值? - How do I create a column based on values of another using dplyr without having to write down every value? 如何使用 dplyr 根据另一列中的值选择列? - How do I select column based on value in another column with dplyr? 如何排除 dplyr 范围内的值? - How do I exclude values within a range in dplyr? 如何从组中的每个其他值计算每个组中的第一个值以计算随时间的变化? - How do I calculate the first value in each group from every other value in the group to calculate change over time?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM