简体   繁体   中英

Reshape data frame from long to wide format

I have a problem like that. I have a database like:

Province       cases        year      month 
Newyork         10          2000         1
Newyork         20          2000         2
Newyork         30          2000         3
Newyork         40          2000         4
Los Angeles     30          2000         1
Los Angeles     40          2000         2
Los Angeles     50          2000         3
Los Angeles     60          2000         4

A very big data for 20 years and many Provinces. How can I regroup my data to get an sequence of time like that:

Province      cases.at.1.2000  cases.at.2.2000  cases.at.3.2000  cases.at.4.2000  
Newyork             10               20                30               40
Los Angeles         30               40                50               60

Just use dcast from reshape2 package:

library(reshape2)

dcast(df, Province~month+year, value.var='cases')
#    Province 1_2000 2_2000 3_2000 4_2000
#1 LosAngeles     30     40     50     60
#2    Newyork     10     20     30     40

Data:

df=structure(list(Province = c("Newyork", "Newyork", "Newyork", 
"Newyork", "LosAngeles", "LosAngeles", "LosAngeles", "LosAngeles"
), cases = c(10L, 20L, 30L, 40L, 30L, 40L, 50L, 60L), year = c(2000L, 
2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L), month = c(1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L)), .Names = c("Province", "cases", 
"year", "month"), class = "data.frame", row.names = c(NA, -8L
))

Edit: if you have missing month/province, you can still use dcast :

#     Province cases year month
#1     Newyork    10 2000     1
#2     Newyork    20 2000     2
#3     Newyork    30 2000     3
#4     Newyork    40 2000     4
#5  LosAngeles    30 2000     1
#6  LosAngeles    40 2000     2
#7  LosAngeles    50 2000     3
#8  LosAngeles    60 2000     4
#9     Newyork    99 2000     5
#10   SanDiego    99 2000     5

dcast(df, Province~month+year, value.var='cases')

#    Province 1_2000 2_2000 3_2000 4_2000 5_2000
#1 LosAngeles     30     40     50     60     NA
#2    Newyork     10     20     30     40     99
#3   SanDiego     NA     NA     NA     NA     99

We can use reshape from base R after joining the 'month' and 'year' columns ( paste(...) )

 reshape(
    transform(df1, yearmonth=paste('at', month, year, sep="."))[,-(3:4)], 
       idvar='Province', timevar='yearmonth', direction='wide')
#  Province cases.at.1.2000 cases.at.2.2000 cases.at.3.2000    cases.at.4.2000
# 1    Newyork              10              20              30              40
# 5 Los Angeles             30              40              50              60

data

df1 <- structure(list(Province = c("Newyork", "Newyork", "Newyork", 
"Newyork", "Los Angeles", "Los Angeles", "Los Angeles", "Los Angeles"
), cases = c(10L, 20L, 30L, 40L, 30L, 40L, 50L, 60L), year = c(2000L, 
2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L), month = c(1L, 
 2L, 3L, 4L, 1L, 2L, 3L, 4L)), .Names = c("Province", "cases", 
"year", "month"), class = "data.frame", row.names = c(NA, -8L))

Based on @Ananda Mahto suggestion:

library(tidyr); library(dplyr)

df %>% mutate(month = paste0("cases.at.", month)) %>%  
  unite(key, month, year, sep=".") %>% spread(key, cases)

If you have missing month - year for some Province, use expand:

df %>% expand(Province, year, month) %>% left_join(df) %>% 
  mutate(month = paste0("cases.at.", month)) %>%  
  unite(key, month, year, sep=".") %>% spread(key, cases)

Data:

df=structure(list(Province = c("Newyork", "Newyork", "Newyork", 
  "Newyork", "LosAngeles", "LosAngeles", "LosAngeles", "LosAngeles", "SanDiego"), 
  cases = c(10L, 20L, 30L, 40L, 30L, 40L, 50L, 60L, 90L), year = c(2000L, 
  2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 2000L), month = c(1L, 
  2L, 3L, 4L, 1L, 2L, 3L, 4L, 4L)), .Names = c("Province", "cases", 
  "year", "month"), class = "data.frame", row.names = c(NA, -9L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM