[英]Reshape data frame from long to wide format
我有這樣的問題。 我有一個數據庫,如:
Province cases year month
Newyork 10 2000 1
Newyork 20 2000 2
Newyork 30 2000 3
Newyork 40 2000 4
Los Angeles 30 2000 1
Los Angeles 40 2000 2
Los Angeles 50 2000 3
Los Angeles 60 2000 4
一個非常大的數據20年和許多省份。 如何重新組合我的數據以獲得如下所示的一系列時間:
Province cases.at.1.2000 cases.at.2.2000 cases.at.3.2000 cases.at.4.2000
Newyork 10 20 30 40
Los Angeles 30 40 50 60
只需使用dcast
從reshape2
包:
library(reshape2)
dcast(df, Province~month+year, value.var='cases')
# Province 1_2000 2_2000 3_2000 4_2000
#1 LosAngeles 30 40 50 60
#2 Newyork 10 20 30 40
數據:
df=structure(list(Province = c("Newyork", "Newyork", "Newyork",
"Newyork", "LosAngeles", "LosAngeles", "LosAngeles", "LosAngeles"
), cases = c(10L, 20L, 30L, 40L, 30L, 40L, 50L, 60L), year = c(2000L,
2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L), month = c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L)), .Names = c("Province", "cases",
"year", "month"), class = "data.frame", row.names = c(NA, -8L
))
編輯:如果您缺少月/省,您仍然可以使用dcast
:
# Province cases year month
#1 Newyork 10 2000 1
#2 Newyork 20 2000 2
#3 Newyork 30 2000 3
#4 Newyork 40 2000 4
#5 LosAngeles 30 2000 1
#6 LosAngeles 40 2000 2
#7 LosAngeles 50 2000 3
#8 LosAngeles 60 2000 4
#9 Newyork 99 2000 5
#10 SanDiego 99 2000 5
dcast(df, Province~month+year, value.var='cases')
# Province 1_2000 2_2000 3_2000 4_2000 5_2000
#1 LosAngeles 30 40 50 60 NA
#2 Newyork 10 20 30 40 99
#3 SanDiego NA NA NA NA 99
加入'月'和'年'列后,我們可以使用base R
reshape
( paste(...)
)
reshape(
transform(df1, yearmonth=paste('at', month, year, sep="."))[,-(3:4)],
idvar='Province', timevar='yearmonth', direction='wide')
# Province cases.at.1.2000 cases.at.2.2000 cases.at.3.2000 cases.at.4.2000
# 1 Newyork 10 20 30 40
# 5 Los Angeles 30 40 50 60
df1 <- structure(list(Province = c("Newyork", "Newyork", "Newyork",
"Newyork", "Los Angeles", "Los Angeles", "Los Angeles", "Los Angeles"
), cases = c(10L, 20L, 30L, 40L, 30L, 40L, 50L, 60L), year = c(2000L,
2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L), month = c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L)), .Names = c("Province", "cases",
"year", "month"), class = "data.frame", row.names = c(NA, -8L))
基於@Ananda Mahto的建議:
library(tidyr); library(dplyr)
df %>% mutate(month = paste0("cases.at.", month)) %>%
unite(key, month, year, sep=".") %>% spread(key, cases)
如果某個省缺少月 - 年,請使用展開:
df %>% expand(Province, year, month) %>% left_join(df) %>%
mutate(month = paste0("cases.at.", month)) %>%
unite(key, month, year, sep=".") %>% spread(key, cases)
數據:
df=structure(list(Province = c("Newyork", "Newyork", "Newyork",
"Newyork", "LosAngeles", "LosAngeles", "LosAngeles", "LosAngeles", "SanDiego"),
cases = c(10L, 20L, 30L, 40L, 30L, 40L, 50L, 60L, 90L), year = c(2000L,
2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L), month = c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L, 4L)), .Names = c("Province", "cases",
"year", "month"), class = "data.frame", row.names = c(NA, -9L))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.