簡體   English   中英

從長到寬聚合和重塑

[英]Aggregate and reshape from long to wide

我之前問過這個問題,得到的答復並不符合我的意願。 當時我使用 stata 來完成這項工作。 但是,由於我經常使用此類數據,因此我希望使用 R 來創建我想要的內容。 我有一個按年齡、性別和診斷分類的每日住院數據集。 我希望從長到寬匯總和重塑數據。 我怎樣才能實現這個目標? 示例數據和所需的輸出如下所示。 列標題指定性別、年齡和診斷的前綴。 謝謝

樣本數據

structure(list(diag = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L), .Label = c("card", "cere"), class = "factor"), sex = structure(c(1L, 
1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("Female", "Male"), class = "factor"), 
    age = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
    1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("35-64", 
    "65-74"), class = "factor"), admissions = c(1L, 1L, 0L, 0L, 
    6L, 6L, 6L, 1L, 4L, 0L, 0L, 0L, 4L, 6L, 5L, 2L, 2L, 4L, 1L, 
    0L, 6L, 5L, 6L, 4L), bdate = structure(c(1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L), .Label = c("1987-01-01", "1987-01-02", 
    "1987-01-03"), class = "factor")), .Names = c("diag", "sex", 
"age", "admissions", "bdate"), row.names = c(NA, -24L), class = "data.frame")

所需的輸出

structure(list(date = structure(1:3, .Label = c("01jan1987", 
"02jan1987", "03jan1987"), class = "factor"), f3564card = c(1L, 
4L, 2L), f6574card = c(1L, 0L, 4L), m3564card = c(0L, 0L, 1L), 
    m6574card = c(0L, 0L, 0L), f3564cere = c(6L, 4L, 6L), f6574cere = c(6L, 
    6L, 5L), m3564cere = c(6L, 5L, 6L), m6574cere = c(1L, 2L, 
    4L)), .Names = c("date", "f3564card", "f6574card", "m3564card", 
"m6574card", "f3564cere", "f6574cere", "m3564cere", "m6574cere"
), class = "data.frame", row.names = c(NA, -3L))

您的數據已經是一個長格式,可以通過“reshape2”輕松使用,如下所示:

library(reshape)
dcast(df, bdate ~ sex + age + diag, value.var = "admissions")
#        bdate Female_35-64_card Female_35-64_cere Female_65-74_card Female_65-74_cere
# 1 1987-01-01                 1                 6                 1                 6
# 2 1987-01-02                 4                 4                 0                 6
# 3 1987-01-03                 2                 6                 4                 5
#   Male_35-64_card Male_35-64_cere Male_65-74_card Male_65-74_cere
# 1               0               6               0               1
# 2               0               5               0               2
# 3               1               6               0               4

我在您的示例輸出中沒有看到任何聚合,但是如果需要聚合,您可以使用fun.aggregate函數來實現這dcast

df <- read.table("D:/Programacao/R/Stackoverflow/Nova pasta/sample.csv",
                                          head = T, dec = '.', sep = ',',
                 stringsAsFactors = F)
head(df)
       date    sex cvd ACS   age
1 01 Jul 91 female   0   0 35-64
2 01 Jul 91   male   0   0 35-64
3 01 Jul 91 female   0   0 35-64
4 01 Jul 91   male   1   1 35-64
5 01 Jul 91 female   0   0 65-74
6 02 Jul 91   male   0   0 65-74

考慮到 cvd 和 ACS 並不分別對男性和女性相互排斥,

library(dplyr)
df %.%
  group_by(date, sex, age) %.%
  summarise(vcvd = sum(cvd),
            vacs = sum(ACS))
Source: local data frame [111 x 5]
Groups: date, sex

        date    sex   age vcvd vacs
1  01 Jul 91 female 35-64    0    0
2  01 Jul 91 female 65-74    0    0
3  01 Jul 91   male 35-64    1    1
4  02 Aug 91 female 35-64    0    0
5  02 Jul 91 female 65-74    1    0
6  02 Jul 91   male 65-74    0    0
7  03 Aug 91 female 65-74    0    0
8  03 Jul 91 female 35-64    0    0
9  04 Jul 91   male 35-64    1    0
10 04 Jul 91   male 65-74    0    0
..       ...    ...   ...  ...  ...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM