简体   繁体   English

r 中具有平均值和标准差的频率表,每行有多个案例

[英]Frequency table with mean and sd in r with multiple cases per row

I'd like to create a frequency table which gives the mean and SD of consultations per year, based on the following (dummy)data:我想创建一个频率表,根据以下(虚拟)数据给出每年咨询的meanSD

    id icpc icpc2 date 
1:  123 D95 F15   2015-06-19 
2:  123 F85       2016-08-15 
3:  332 A01       2010-03-16
4:  332 A04       2018-01-20 
5:  332 K20       2017-02-20
6:  100 B10       2017-06-01
7:  100 A04       2008-01-11
8:  113 T08       2018-03-18
9:  113 P28       2017-01-19
10: 113 D95 A01   2013-01-16
11: 113 A04       2009-05-01
12: 551 B12 A01   2011-04-03
13: 551 D95       2015-05-09

Reproducible data:可重现的数据:

df <- structure(list(id = c(123L, 123L, 332L, 332L, 332L, 100L, 100L, 
113L, 113L, 113L, 113L, 551L, 551L), icpc = c("D95", "F85", "A01", 
"A04", "K20", "B10", "A04", "T08", "P28", "D95", "A04", "B12", 
"D95"), icpc2 = c("F15", "", "", "", "", "", "", "", "", "A01", 
"", "A01", ""), date = c("2015-06-19", "2016-08-15", "2010-03-16", 
"2018-01-20", "2017-02-20", "2017-05-01", "2008-01-11", "1201803-18", 
"2017-01-19", "2013-01-16", "2009-05-01", "2011-04-03", "2015-05-09"
)), class = "data.frame", row.names = c(NA, -13L))

I did the following steps and was able to get a frequency table with the mean , but I think there should be an easier way and I have yet been unable to get the SD .我做了以下步骤,并能够得到一个频率表的mean ,但我认为应该有一个更简单的方法,我还没有得到SD Please help me get the SD per year .请帮助我获得yearSD

To count the mean, I made a new column ( consult ) with 1 for each consultation (based on icpc ):为了计算平均值,我为每次咨询(基于icpc )创建了一个新列( consult ):

setDT(df)[, consult := if (any(icpc %in% "")) "1" else "1", ]
df$consult <- as.numeric(df$consult)

From there:从那里:

#consultation frequency per year
df.freq.year <- df %>%
  mutate(year = format(date, "%Y")) %>%
  group_by(id, year) %>%
  summarise(frequency = sum(consult))

#mean consultations per year
df.mean.year <- df.freq.year %>%
  group_by(id, year) %>%
  summarise(mean = mean(frequency))

#make table with number of patients per year
df.pat <- df %>%
      mutate(year = format(date, "%Y")) %>%
      group_by(year) %>%
      summarise(Nbr.patients = sum(length(unique(id))))

I've tried the following (unsuccessful):我尝试了以下方法(不成功):

sqrt(var(df.freq.year$frequency, by = "year"))

My output should look something like this:我的 output 应该是这样的:

   year mean SD
1:  2008 5.2  1.3
2:  2009 4.0  1.1
3:  2010 8.9  1.6
4:  2011 4.9  2.1
5:  2012 3.4  1.1
6:  2013 2.3  1.1
7:  2014 9.5  1.3
8:  2015 12.0 2.1
9:  2016 11.4 2.6
10: 2017 8.9  2.0
11: 2018 6.7  2.2

Okay, I managed to solve it...好的,我设法解决了它......

#consultation frequency per patient per year
df.freq.patyear <- df %>%
  group_by(id, year) %>%
  summarise(frequency = sum(consult))

#calculate SD per year
df.sd <- df.freq.patyear %>%
  group_by(year) %>%
  summarise(SD = sd(frequency))

df.table <- merge(df.mean.year, df.sd, by = "year")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM