[英]Create empty missing lines for values in multi-group multi-level data-structures and calculate difference between rows within groups
Let's say I have the following dataset:假设我有以下数据集:
ID Type Group Week Value
111 A Pepper -1 10
112 B Salt 2 20
113 C Curry 4 40
114 D Rosemary 9 90
211 A Pepper -1 15
212 B Salt 2 30
214 D Rosemary 9 135
Where ID, Type and Group as well as Week are entered in a measurement instrument measuring "value" each week.在每周测量“值”的测量仪器中输入 ID、类型和组以及周。 Sometimes there are multiple results per week so the initial tidying was to create a mean for each weekly measurement.
有时每周会有多个结果,因此最初的整理是为每个每周测量创建一个平均值。
I would like to我想要
a) create a dataset where the rows are automatically inserted where there are empty lines in the Week-column so it looks like this - always with the Type order A, B, C, D and Group order Pepper, Salt, Curry, Rosemary and Week -1, 2, 4, 9. a) 创建一个数据集,其中在 Week 列中有空行的地方自动插入行,因此它看起来像这样 - 始终使用类型顺序 A、B、C、D 和组顺序 Pepper、Salt、Curry、Rosemary 和第 -1、2、4、9 周。
ID Type Group Week Value
111 A Pepper -1 10
112 B Salt 2 20
113 C Curry 4 40
114 D Rosemary 9 90
211 A Pepper -1 15
212 B Salt 2 30
213 C Curry 4 60
214 D Rosemary 9 135
b) The objective is to calculate the difference between the measured values in a vertical plane only for each group ie: b) 目标是计算每个组在垂直平面上的测量值之间的差异,即:
ID Type Group Week Value Diff
111 A Pepper -1 10 NA
112 B Salt 2 20 10
113 C Curry 4 40 20
114 D Rosemary 9 90 50
211 A Pepper -1 15 NA
212 B Salt 2 30 15
213 C Curry 4 60 30
214 D Rosemary 9 135 75
I can see how to do this in a for loop but there must be a more elegant way?我可以看到如何在 for 循环中执行此操作,但必须有更优雅的方式?
I'm not sure this will be helpful, but thought it might be a start.我不确定这会有所帮助,但认为这可能是一个开始。
If you have repeating groups of rows, I might create a generic data frame and repeat it multiple times, then join with your available data set.如果您有重复的行组,我可能会创建一个通用数据框并重复多次,然后加入您的可用数据集。 This will effectively insert rows that are missing.
这将有效地插入丢失的行。
Also, if you use tidyverse
you can calculate the diff
by using lag
.此外,如果您使用
tidyverse
您可以使用lag
计算diff
。
Note this will not give the exact same result as I was not sure where 60
for Curry
came from (will edit answer later).请注意,这不会给出完全相同的结果,因为我不确定
Curry
60
来自哪里(稍后将编辑答案)。
library(tidyverse)
# Define number of repeating groups
N = 2
# Create generic group of Type, Group, Week
df <- data.frame(
Type = c("A", "B", "C", "D"),
Group = c("Pepper", "Salt", "Curry", "Rosemary"),
Week = c(-1, 2, 4, 9)
)
# Represents the number of rows
nrow_df <- nrow(df)
# Repeat groups of rows N times
full_df <- df[rep(seq_len(nrow_df), times = N), ]
# Add ID numbers
full_df$ID <- rep(seq(110, (100 * N) + 10, by=100), each=nrow_df) + seq(1:nrow_df)
# Second data frame with missing rows
df2 <- read.table(text =
"ID Type Group Week Value
111 A Pepper -1 10
112 B Salt 2 20
113 C Curry 4 40
114 D Rosemary 9 90
211 A Pepper -1 15
212 B Salt 2 30
214 D Rosemary 9 135", header = T, stringsAsFactors = T)
# Join the data frames and get differences
full_df %>%
left_join(df2) %>%
group_by(grp = ceiling(row_number()/nrow_df)) %>%
mutate(Diff = Value - lag(Value))
# A tibble: 8 x 7
# Groups: grp [2]
Type Group Week ID Value grp Diff
<fct> <fct> <dbl> <dbl> <int> <dbl> <int>
1 A Pepper -1 111 10 1 NA
2 B Salt 2 112 20 1 10
3 C Curry 4 113 40 1 20
4 D Rosemary 9 114 90 1 50
5 A Pepper -1 211 15 2 NA
6 B Salt 2 212 30 2 15
7 C Curry 4 213 NA 2 NA
8 D Rosemary 9 214 135 2 NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.