[英]list of data frames, trying to create new column with normalisation values for each dataframe
I'm new to r and mostly work with dataframe s.我是r的新手,主要使用dataframe s。 A frequent task is to normalize counts for several parameters from several data frames.
一项常见的任务是对来自多个数据帧的多个参数的计数进行归一化。 I have a demo dataset: dataset
我有一个演示数据集:数据集
Season![]() |
Product![]() |
Quality![]() |
Sales![]() |
---|---|---|---|
Winter![]() |
Apple![]() |
bad![]() |
345 ![]() |
Winter![]() |
Apple![]() |
good![]() |
13 ![]() |
Winter![]() |
Potato![]() |
bad![]() |
23 ![]() |
Winter![]() |
Potato![]() |
good![]() |
66 ![]() |
Winter![]() |
Beer![]() |
bad![]() |
345 ![]() |
Winter![]() |
Beer![]() |
good![]() |
34 ![]() |
Summer![]() |
Apple![]() |
bad![]() |
88 ![]() |
Summer![]() |
Apple![]() |
good![]() |
90 ![]() |
Summer![]() |
Potato![]() |
bad![]() |
123 ![]() |
Summer![]() |
Potato![]() |
good![]() |
457 ![]() |
Summer![]() |
Beer![]() |
bad![]() |
44 ![]() |
Summer![]() |
Beer![]() |
good![]() |
546 ![]() |
What I want to do is add a column "FC" ([tag:fold change]) for "Sales".我想要做的是为“销售”添加一列“FC”([tag:fold change])。 FC must be calculated for each "Season" and "Product" according to "Quality".
必须根据“质量”为每个“季节”和“产品”计算 FC。 "Bad" is the baseline.
“坏”是底线。
Desired result:期望的结果:
Season![]() |
Product![]() |
Quality![]() |
Sales![]() |
FC ![]() |
---|---|---|---|---|
Winter![]() |
Apple![]() |
bad![]() |
345 ![]() |
1.00 ![]() |
Winter![]() |
Apple![]() |
good![]() |
13 ![]() |
0.04 ![]() |
Winter![]() |
Potato![]() |
bad![]() |
23 ![]() |
1.00 ![]() |
Winter![]() |
Potato![]() |
good![]() |
66 ![]() |
2.87 ![]() |
Winter![]() |
Beer![]() |
bad![]() |
345 ![]() |
1.00 ![]() |
Winter![]() |
Beer![]() |
good![]() |
34 ![]() |
0.10 ![]() |
Summer![]() |
Apple![]() |
bad![]() |
88 ![]() |
1.00 ![]() |
Summer![]() |
Apple![]() |
good![]() |
90 ![]() |
1.02 ![]() |
Summer![]() |
Potato![]() |
bad![]() |
123 ![]() |
1.00 ![]() |
Summer![]() |
Potato![]() |
good![]() |
457 ![]() |
3.72 ![]() |
Summer![]() |
Beer![]() |
bad![]() |
44 ![]() |
1.00 ![]() |
Summer![]() |
Beer![]() |
good![]() |
546 ![]() |
12.41 ![]() |
One way to do it is to filter first by "Season" and then by "Product" (eg creating subset data frame subset_winter_apple ) and then calculate FC similarly to this: subset_winter_apple$FC = subset_winter_apple$Sales / subset_winter_apple$Sales[1].一种方法是先按“季节” 过滤,然后按“产品”过滤(例如创建子集数据框subset_winter_apple ),然后计算 FC 类似于:subset_winter_apple$FC = subset_winter_apple$Sales / subset_winter_apple$Sales[1]。 Later on, I can then combine all subset dataframes again eg using rbind to reconstitute the original data frame with the FC column.
稍后,我可以再次组合所有子集数据帧,例如使用rbind重建带有 FC 列的原始数据帧。 However, this is highly inefficient.
然而,这是非常低效的。 So I thought of split ting the data frame and creating a list : split(dataset, list(dataset$Season, dataset$Product)) .
所以我想到拆分数据框并创建一个列表: split(dataset, list(dataset$Season, dataset$Product)) 。
However, now I struggle with the normalisation (FC calculation) as I do not know how to reference the specific first cell value of "Sales" in the list of data frames so that each value in that column in each listed data frame is individually normalized.但是,现在我在规范化(FC 计算)方面苦苦挣扎,因为我不知道如何在数据框列表中引用“Sales”的特定第一个单元格值,以便每个列出的数据框中该列中的每个值都被单独规范化. I did manage to calculate an FC value for the list, however, it is an exact copy in each listed data frame from the first one using lappy :
我确实设法计算了列表的 FC 值,但是,它是每个列出的数据框中的精确副本,来自第一个使用lappy的数据框:
lapply(dataset, function(DF){DF$FC = dataset[[1]]$Sales/dataset[[1]]$Sales[1]; DF}) lapply(数据集,函数(DF){DF$FC = 数据集[[1]]$Sales/数据集[[1]]$Sales[1];DF})
Clearly, I do not know how to reference the first cell in a specific column to normalize the entire column for each listed data frame .显然,我不知道如何引用特定列中的第一个单元格来规范化每个列出的数据框的整个列。 Can somebody please help me?
有人能帮帮我吗?
Many thanks in advance for your suggestions.非常感谢您的建议。
Using logical indexing within a grouped mutate()
:在分组的
mutate()
中使用逻辑索引:
library(dplyr)
dataset %>%
group_by(Season, Product) %>%
mutate(FC = Sales / Sales[Quality == "bad"]) %>%
ungroup()
# A tibble: 12 × 5
Season Product Quality Sales FC
<chr> <chr> <chr> <int> <dbl>
1 Winter Apple bad 345 1
2 Winter Apple good 13 0.0377
3 Winter Potato bad 23 1
4 Winter Potato good 66 2.87
5 Winter Beer bad 345 1
6 Winter Beer good 34 0.0986
7 Summer Apple bad 88 1
8 Summer Apple good 90 1.02
9 Summer Potato bad 123 1
10 Summer Potato good 457 3.72
11 Summer Beer bad 44 1
12 Summer Beer good 546 12.4
Using by()
:使用
by()
:
dataset <- by(
dataset,
list(dataset$Season, dataset$Product),
\(x) transform(x, FC = Sales / Sales[Quality == "bad"])
)
dataset <- do.call(rbind, dataset)
dataset[order(as.numeric(rownames(dataset))), ]
Season Product Quality Sales FC
1 Winter Apple bad 345 1.00000000
2 Winter Apple good 13 0.03768116
3 Winter Potato bad 23 1.00000000
4 Winter Potato good 66 2.86956522
5 Winter Beer bad 345 1.00000000
6 Winter Beer good 34 0.09855072
7 Summer Apple bad 88 1.00000000
8 Summer Apple good 90 1.02272727
9 Summer Potato bad 123 1.00000000
10 Summer Potato good 457 3.71544715
11 Summer Beer bad 44 1.00000000
12 Summer Beer good 546 12.40909091
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.