简体   繁体   English

为多组多级数据结构中的值创建空缺失行并计算组内行之间的差异

[英]Create empty missing lines for values in multi-group multi-level data-structures and calculate difference between rows within groups

Let's say I have the following dataset:假设我有以下数据集:

ID  Type  Group      Week    Value
111 A      Pepper     -1      10
112 B      Salt        2      20
113 C      Curry       4      40
114 D      Rosemary    9      90
211 A      Pepper     -1      15
212 B      Salt        2      30
214 D      Rosemary    9      135

Where ID, Type and Group as well as Week are entered in a measurement instrument measuring "value" each week.在每周测量“值”的测量仪器中输入 ID、类型和组以及周。 Sometimes there are multiple results per week so the initial tidying was to create a mean for each weekly measurement.有时每周会有多个结果,因此最初的整理是为每个每周测量创建一个平均值。

I would like to我想要

a) create a dataset where the rows are automatically inserted where there are empty lines in the Week-column so it looks like this - always with the Type order A, B, C, D and Group order Pepper, Salt, Curry, Rosemary and Week -1, 2, 4, 9. a) 创建一个数据集,其中在 Week 列中有空行的地方自动插入行,因此它看起来像这样 - 始终使用类型顺序 A、B、C、D 和组顺序 Pepper、Salt、Curry、Rosemary 和第 -1、2、4、9 周。

ID  Type  Group      Week    Value
111 A      Pepper     -1      10
112 B      Salt        2      20
113 C      Curry       4      40
114 D      Rosemary    9      90
211 A      Pepper     -1      15
212 B      Salt        2      30
213 C      Curry       4      60
214 D      Rosemary    9      135

b) The objective is to calculate the difference between the measured values in a vertical plane only for each group ie: b) 目标是计算每个组在垂直平面上的测量值之间的差异,即:

ID  Type  Group      Week    Value  Diff
111 A      Pepper     -1      10     NA
112 B      Salt        2      20     10
113 C      Curry       4      40     20 
114 D      Rosemary    9      90     50
211 A      Pepper     -1      15     NA
212 B      Salt        2      30     15
213 C      Curry       4      60     30
214 D      Rosemary    9      135    75

I can see how to do this in a for loop but there must be a more elegant way?我可以看到如何在 for 循环中执行此操作,但必须有更优雅的方式?

I'm not sure this will be helpful, but thought it might be a start.我不确定这会有所帮助,但认为这可能是一个开始。

If you have repeating groups of rows, I might create a generic data frame and repeat it multiple times, then join with your available data set.如果您有重复的行组,我可能会创建一个通用数据框并重复多次,然后加入您的可用数据集。 This will effectively insert rows that are missing.这将有效地插入丢失的行。

Also, if you use tidyverse you can calculate the diff by using lag .此外,如果您使用tidyverse您可以使用lag计算diff

Note this will not give the exact same result as I was not sure where 60 for Curry came from (will edit answer later).请注意,这不会给出完全相同的结果,因为我不确定Curry 60来自哪里(稍后将编辑答案)。

library(tidyverse)

# Define number of repeating groups
N = 2

# Create generic group of Type, Group, Week
df <- data.frame(
  Type = c("A", "B", "C", "D"),
  Group = c("Pepper", "Salt", "Curry", "Rosemary"),
  Week = c(-1, 2, 4, 9)
)

# Represents the number of rows
nrow_df <- nrow(df)

# Repeat groups of rows N times
full_df <- df[rep(seq_len(nrow_df), times = N), ]

# Add ID numbers
full_df$ID <- rep(seq(110, (100 * N) + 10, by=100), each=nrow_df) + seq(1:nrow_df)

# Second data frame with missing rows
df2 <- read.table(text =
"ID  Type  Group      Week    Value
111 A      Pepper     -1      10
112 B      Salt        2      20
113 C      Curry       4      40
114 D      Rosemary    9      90
211 A      Pepper     -1      15
212 B      Salt        2      30
214 D      Rosemary    9      135", header = T, stringsAsFactors = T)

# Join the data frames and get differences
full_df %>%
  left_join(df2) %>%
  group_by(grp = ceiling(row_number()/nrow_df)) %>%
  mutate(Diff = Value - lag(Value))

# A tibble: 8 x 7
# Groups:   grp [2]
  Type  Group     Week    ID Value   grp  Diff
  <fct> <fct>    <dbl> <dbl> <int> <dbl> <int>
1 A     Pepper      -1   111    10     1    NA
2 B     Salt         2   112    20     1    10
3 C     Curry        4   113    40     1    20
4 D     Rosemary     9   114    90     1    50
5 A     Pepper      -1   211    15     2    NA
6 B     Salt         2   212    30     2    15
7 C     Curry        4   213    NA     2    NA
8 D     Rosemary     9   214   135     2    NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM