[英]How to calculate average change over a variable number of columns using score data
I am trying to find the average difference in a score over repeated measures.我试图找到重复测量分数的平均差异。 The problem is, not every observation is measured unequally often and the values in the columns represent scores on 6 point scale.
问题是,并非每个观察值都经常被不平等地测量,并且列中的值代表 6 分制的分数。
the data is present it both Long and wide format like this:数据以长格式和宽格式存在,如下所示:
ID Type M1 M2 M3 M4 M6
1 A 5 5 3
2 A 4 3 1
3 A 2 5 3 5 5
4 C 5 4 4 3
5 B 3
6 F 4 2 3 4 1
This is the alternative format:这是替代格式:
ID Type M Score
1 A 1 5
1 A 2 5
1 A 3 3
2 A 1 4
2 A 2 3
2 A 3 1
4 C 1 5
4 C 2 4
4 C 3 4
4 C 4 3
I am not really interested in the interim values, but I need the difference between M1 and whatever is the last measurement for that ID then I need to take the average of those differences.我对中间值并不真正感兴趣,但我需要 M1 与该 ID 的最后一次测量值之间的差异,然后我需要取这些差异的平均值。 I will need to do it across all types and then later broken down by type.
我需要在所有类型中执行此操作,然后再按类型进行细分。
Packages installed are: dplyr, purrr, stringr, tydir, tibble, data.table安装的软件包有:dplyr、purrr、stringr、tydir、tibble、data.table
The closest I got was the following:我得到的最接近的是以下内容:
df %>% group_by(M)%>%
arrange(M)%>%
summarize(avg = as.numeric(mean(diff(Score))), sd =
as.numeric(sd(diff(Score))))
and和
df %>% group_by(Type)%>%
arrange(M)%>%
summarize(avg = as.numeric(mean(diff(Score))), sd =
as.numeric(sd(diff(Score))))
This was done on the Long format data and gave the result:这是在长格式数据上完成的,并给出了结果:
M avg sd
<fctr> <dbl> <dbl>
1 1 NA NA
2 2 NA NA
3 3 -0.03370787 1.741534
4 4 -0.04878049 2.036556
5 5 -0.18181818 1.887760
6 6 0.00000000 1.095445
7 7 NaN NA
8 8 NaN NA
9 9 NaN NA
10 <NA> -0.16666667 1.722401
The table above is taken from my analysis and not related to example tables.上表摘自我的分析,与示例表无关。 The NA and NaN are a problem as I know there is data in some of the sections, but it is unable to calculate the average difference.
NA 和 NaN 是一个问题,因为我知道某些部分有数据,但无法计算平均差异。
One solution for avg
per ID
could be using dplyr
based on OP feedback to calculate average of difference of first and last measurement.每个
ID
avg
一种解决方案是使用基于 OP 反馈的dplyr
来计算第一次和最后一次测量的差异的平均值。
library(dplyr)
df %>% group_by(ID) %>%
arrange(M) %>%
summarise(avg = abs(first(Score) - last(Score))/n())
#Result
# ID avg
# <int> <dbl>
#1 1 0.667
#2 2 1.00
#3 4 0.500
Actual average
and SD
for each ID
can be calculated as:每个
ID
实际average
和SD
可以计算为:
df %>% group_by(ID) %>%
arrange(M) %>%
summarise(avg = mean(Score), SD = sd(Score))
#Result
ID avg SD
<int> <dbl> <dbl>
1 1 4.33 1.15
2 2 2.67 1.53
3 4 4.00 0.816
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.