简体   繁体   English

Tidyverse 功能

[英]Tidyverse function

I am new(ish) to R and would need your help.我是 R 的新手(ish),需要您的帮助。 I have a dataset with 5 levels of a treatment for a response variable.我有一个包含 5 个响应变量处理级别的数据集。 Assume, I measured soil N content at 5 levels (optimal, 40%, 30%, 20%, and 10%) of soil water content.假设,我在土壤含水量的 5 个水平(最佳、40%、30%、20% 和 10%)下测量了土壤 N 含量。 And for each level I have 5 replicates.对于每个级别,我有 5 个重复。 Now, I would like to calculate unstandardized (optimal - 40%, optimal -30%, optimal - 20%, optimal - 10%) and standardized (optimal - 40% / optimal, optimal - 30% / optimal, and so on) for each replicate.现在,我想计算非标准化(最优 - 40%,最优 -30%,最优 - 20%,最优 - 10%)和标准化(最优 - 40% / 最优,最优 - 30% / 最优,等等)对于每个重复。 Is there any way to do this in R with tidyverse?有没有办法在 R 中使用 tidyverse 做到这一点? I am still very new to make 'loop' functions.我对制作“循环”功能仍然很陌生。 This would be a great help if someone can answer with a potential code.如果有人可以用潜在的代码回答,这将是一个很大的帮助。

(As noted in my comment above, it will be easier to answer your questions on this forum if you can share sample data, current code, and your expectations. Then potential answerers can have greater confidence that they're actually answering your question, vs. a version of what you question sounds like.) (正如我在上面的评论中所指出的,如果您可以共享示例数据、当前代码和您的期望,那么在此论坛上回答您的问题会更容易。然后潜在的回答者可以更有信心他们实际上正在回答您的问题,而不是. 你所质疑的版本听起来像。)

Here's an approach using dplyr, where first we calculate the means for each level/treatment using group_by + summarize .下面是使用dplyr,在这里我们首先计算装置使用每级/治疗的方法group_by + summarize Note, there were two dimensions of grouping ( treated + levels ), and summarize "peels off" the last one to be applied (in this case levels ).请注意,分组有两个维度( treated + levels ),并总结“剥离”要应用的最后一个维度(在本例中为levels )。 So after the summarize line, the data is still grouped by treated .因此,在summarize行之后,数据仍按treated分组。 We can using the brackets [] notation to specify the level to use for standardization.我们可以使用方括号[]符号来指定用于标准化的级别。 In this case, I am dividing each value by the "optimal" value within its respective treated group.在这种情况下,我将每个值除以其各自treated组内的“最佳”值。

library(dplyr)
df %>%
  group_by(treated, levels) %>%
  summarize(avg_raw = mean(values)) %>%
  mutate(avg_standarized = avg_raw / avg_raw[levels == "optimal"]) %>%
  ungroup()

output输出

# A tibble: 10 x 4
   treated levels  avg_raw avg_standarized
   <lgl>   <chr>     <dbl>           <dbl>
 1 FALSE   10%       0.628           1.16 
 2 FALSE   20%       0.502           0.927
 3 FALSE   30%       0.370           0.684
 4 FALSE   40%       0.606           1.12 
 5 FALSE   optimal   0.541           1    
 6 TRUE    10%       0.608           1.55 
 7 TRUE    20%       0.371           0.945
 8 TRUE    30%       0.499           1.27 
 9 TRUE    40%       0.629           1.60 
10 TRUE    optimal   0.393           1   

Sample data样本数据

df <- data.frame(stringsAsFactors = FALSE,
                 levels = rep(c("optimal", "40%", "30%", "20%", "10%"), 4),
                 treated = rep(c(TRUE, FALSE), each = 10),
                 values = (sin(1:20)^2))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM