简体   繁体   中英

Reshaping large dataset in R

I am trying to reshape a large dataset and have a problem not getting the results in the right order as I want to.

Here is what the data looks like:

GeoFIPS GeoName IndustryID  Description X2001   X2002   X2003   X2004   X2005 
10180   Abilene, TX     21  Mining      96002   92407   127138 150449   202926
10180   Abilene, TX     22  Utilities   33588   34116   33105   33265   32452
...

The dataframe is very long and includes all MSAs in the US with selected Industry sectors.

The way I wanted it to look like is this:

GeoFIPS GeoName        Year     Mining Utilities (etc)
10180   Abilene, TX    2001     96002   33588
10180   Abilene, TX    2002     92407   34116
....

I am pretty new to R and would really appreciate your help. I have checked wide to long and long to wide, but this seems to be a more complex situation. Thank you!

Edit: Data

df1 <- structure(list(GeoFIPS = c(10180L, 10180L), GeoName =
c("Abilene, TX", 
"Abilene, TX"), IndustryID = 21:22, Description = c("Mining", 
"Utilities"), X2001 = c(96002L, 33588L), X2002 = c(92407L, 34116L
), X2003 = c(127138L, 33105L), X2004 = c(150449L, 33265L), X2005 = 
c(202926L, 
 32452L)), .Names = c("GeoFIPS", "GeoName", "IndustryID", "Description", 
"X2001", "X2002", "X2003", "X2004", "X2005"), class = "data.frame",
 row.names = c(NA, -2L))

You could use melt/dcast from reshape2

library(reshape2)
df2 <- melt(df1, id.var=c('GeoFIPS', 'GeoName', 
               'IndustryID', 'Description'))
df2 <- transform(df2, Year=sub('^X', '', variable))[-c(3,5)]


dcast(df2, ...~Description, value.var='value')
#  GeoFIPS     GeoName Year Mining Utilities
#1   10180 Abilene, TX 2001  96002     33588
#2   10180 Abilene, TX 2002  92407     34116
#3   10180 Abilene, TX 2003 127138     33105
#4   10180 Abilene, TX 2004 150449     33265
#5   10180 Abilene, TX 2005 202926     32452

data

df1 <- structure(list(GeoFIPS = c(10180L, 10180L), GeoName =
c("Abilene, TX", 
"Abilene, TX"), IndustryID = 21:22, Description = c("Mining", 
"Utilities"), X2001 = c(96002L, 33588L), X2002 = c(92407L, 34116L
), X2003 = c(127138L, 33105L), X2004 = c(150449L, 33265L), X2005 = 
c(202926L, 
 32452L)), .Names = c("GeoFIPS", "GeoName", "IndustryID", "Description", 
"X2001", "X2002", "X2003", "X2004", "X2005"), class = "data.frame",
 row.names = c(NA, -2L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM