简体   繁体   English

如何使用分数数据计算可变列数的平均变化

[英]How to calculate average change over a variable number of columns using score data

I am trying to find the average difference in a score over repeated measures.我试图找到重复测量分数的平均差异。 The problem is, not every observation is measured unequally often and the values in the columns represent scores on 6 point scale.问题是,并非每个观察值都经常被不平等地测量,并且列中的值代表 6 分制的分数。

the data is present it both Long and wide format like this:数据以长格式和宽格式存在,如下所示:

ID    Type    M1    M2    M3    M4    M6
1      A       5     5    3
2      A       4     3    1
3      A       2     5    3     5      5
4      C       5     4    4     3
5      B       3 
6      F       4     2    3     4      1

This is the alternative format:这是替代格式:

ID    Type    M    Score
1       A     1      5
1       A     2      5
1       A     3      3
2       A     1      4
2       A     2      3
2       A     3      1
4       C     1      5
4       C     2      4
4       C     3      4
4       C     4      3

I am not really interested in the interim values, but I need the difference between M1 and whatever is the last measurement for that ID then I need to take the average of those differences.我对中间值并不真正感兴趣,但我需要 M1 与该 ID 的最后一次测量值之间的差异,然后我需要取这些差异的平均值。 I will need to do it across all types and then later broken down by type.我需要在所有类型中执行此操作,然后再按类型进行细分。

Packages installed are: dplyr, purrr, stringr, tydir, tibble, data.table安装的软件包有:dplyr、purrr、stringr、tydir、tibble、data.table

The closest I got was the following:我得到的最接近的是以下内容:

df %>% group_by(M)%>%
    arrange(M)%>%
    summarize(avg = as.numeric(mean(diff(Score))), sd = 
as.numeric(sd(diff(Score))))

and

df %>% group_by(Type)%>%
    arrange(M)%>%
    summarize(avg = as.numeric(mean(diff(Score))), sd = 
as.numeric(sd(diff(Score))))

This was done on the Long format data and gave the result:这是在长格式数据上完成的,并给出了结果:

       M           avg       sd
     <fctr>       <dbl>    <dbl>
 1            1          NA       NA
 2            2          NA       NA
 3            3 -0.03370787 1.741534
 4            4 -0.04878049 2.036556
 5            5 -0.18181818 1.887760
 6            6  0.00000000 1.095445
 7            7         NaN       NA
 8            8         NaN       NA
 9            9         NaN       NA
10         <NA> -0.16666667 1.722401

The table above is taken from my analysis and not related to example tables.上表摘自我的分析,与示例表无关。 The NA and NaN are a problem as I know there is data in some of the sections, but it is unable to calculate the average difference. NA 和 NaN 是一个问题,因为我知道某些部分有数据,但无法计算平均差异。

One solution for avg per ID could be using dplyr based on OP feedback to calculate average of difference of first and last measurement.每个ID avg一种解决方案是使用基于 OP 反馈的dplyr来计算第一次和最后一次测量的差异的平均值。

library(dplyr)

df %>% group_by(ID) %>%
  arrange(M) %>%
  summarise(avg = abs(first(Score) - last(Score))/n())

#Result
#     ID   avg
#  <int> <dbl>
#1     1 0.667
#2     2 1.00 
#3     4 0.500

Actual average and SD for each ID can be calculated as:每个ID实际averageSD可以计算为:

df %>% group_by(ID) %>%
  arrange(M) %>%
  summarise(avg = mean(Score), SD = sd(Score))

#Result
     ID   avg    SD
  <int> <dbl> <dbl>
1     1  4.33 1.15 
2     2  2.67 1.53 
3     4  4.00 0.816 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM