简体   繁体   中英

R function to reshape data frame while selecting multiple variables?

I'd like to reshape a data frame from long to wide (dummy data given below). My real data frame has many (40+) numeric variables, but I'd like to select only three of the latter when reshaping.

Options I have found so far include:

(1) reshape one variable using dcast() :

library(reshape2)
dcast(d, group1 + group2 ~ location, value.var = "mass")

(2) reshape all the variables using reshape() :

reshape(d, idvar = c("group1", "group2"), timevar = "location", sep = ".", 
        direction = "wide")

(3) create a vector of variables I'd like to exclude called variables.to.drop and pass this to reshape() :

variables.to.drop <- c("diameter", "volume")

reshape(d, idvar = c("group1", "group2"), timevar = "location", sep = ".", 
        direction = "wide", drop = variables.to.drop)

but I haven't found a function that takes a vector or list of variables to reshape. Basically a version of dcast() that allows a list or vector to be passed to the argument value.var would fit the bill. Is there a function like this that I haven't come across?

(I realize I can subset the data before doing the reshape operation, but it would be much cleaner to simply specify the variables in the reshaping function - since using subset() would involve having to specify all ID variables to be included, whereas IDs are recognized automatically in most reshaping functions. I could also just use the drop argument in reshape() , as above, but I want to select 3 variables out of 40+, so this is cumbersome.)

d <- structure(list(group1 = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("ripe", "unripe"), class = "factor"), 
group2 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 
5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 
4L, 4L, 4L, 4L, 4L, 4L), .Label = c("apple", "grapefruit", 
"orange", "peach", "pear"), class = "factor"), type = structure(c(2L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("large", 
"small"), class = "factor"), location = structure(c(1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("P1", 
"P2", "P3"), class = "factor"), diameter = c(17.2, 19.1, 
18.5, 23.3, 22.9, 19.4, 11.1, 11.8, 6.8, 3.2, 7.9, 5.6, 8.4, 
9.2, 9.7, 17.1, 19.4, 18.9, 11.8, 10.6, 10.1, 18.8, 17.9, 
13.2, 8.5, 8.9, 7.2, 10.1, 8.7, 6.6), mass = c(11.1370341130532, 
16.2229940481484, 16.0927473288029, 16.2337944167666, 18.6091538355686, 
16.4031060528941, 10.0949575635605, 12.3255050601438, 16.6608375823125, 
15.1425114134327, 16.9359129178338, 15.4497483558953, 12.8273358359002, 
19.2343348427676, 12.9231584025547, 18.3729562815279, 12.8622328466736, 
12.6682078000158, 11.8672278965823, 12.3222591052763, 13.1661245482974, 
13.0269337072968, 11.590460028965, 10.3999591805041, 12.1879954100586, 
18.1059855245985, 15.2569754677825, 19.1465816600248, 18.3134504687041, 
10.4577026329935), volume = c(39.1218296485022, 35.3037334373221, 
36.0934440605342, 40.1461374014616, 33.6219241656363, 45.1934127090499, 
34.0249607525766, 35.1761963730678, 49.8430083505809, 46.1470468062907, 
41.0666718147695, 42.9281218815595, 36.2364861415699, 42.4363839626312, 
36.5954035148025, 40.0399494590238, 43.5418905457482, 39.6998247830197, 
34.8785765469074, 45.3091957513243, 31.4755976013839, 36.193732037209, 
44.3454348668456, 40.0909182429314, 33.0599791789427, 40.0786697631702, 
39.879218460992, 45.0240039406344, 33.4929964784533, 46.9678482087329
)), .Names = c("group1", "group2", "type", "location", "diameter", 
"mass", "volume"), row.names = c(NA, -30L), class = "data.frame")

You have resisted adding details needed to allow a specific answer (names of columns in question and example or explicit structure of desired output), but perhaps this is what you are asking to be done. If you want to drop a specific set of columns from a dataframe prior to reshaping it is reasonably straightforward:

 colNamesToBeDropped <- c("colnam1","colnam2","colnam3","colnam4","colnam5")
 colsToBekept <- ! names(dfrm) %in% colNamesToBeDropped
 reshape( dfrm[ , colsToBeKept] , .... rest of parameters ...  )

#Using a dataset that is on everyone's machine 
> state.x77 <- as.data.frame(state.x77)
> str(state.x77)
'data.frame':   50 obs. of  8 variables:
 $ Population: num  3615 365 2212 2110 21198 ...
 $ Income    : num  3624 6315 4530 3378 5114 ...
 $ Illiteracy: num  2.1 1.5 1.8 1.9 1.1 0.7 1.1 0.9 1.3 2 ...
 $ Life Exp  : num  69 69.3 70.5 70.7 71.7 ...
 $ Murder    : num  15.1 11.3 7.8 10.1 10.3 6.8 3.1 6.2 10.7 13.9 ...
 $ HS Grad   : num  41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 52.6 40.6 ...
 $ Frost     : num  20 152 15 65 20 166 139 103 11 60 ...
 $ Area      : num  50708 566432 113417 51945 156361 ...
> colNamesToBeDropped <- c("Income", "Murder")
>      colsToBeKept <- ! names(state.x77) %in% colNamesToBeDropped
>      str( state.x77[ , colsToBeKept] )
'data.frame':   50 obs. of  6 variables:
 $ Population: num  3615 365 2212 2110 21198 ...
 $ Illiteracy: num  2.1 1.5 1.8 1.9 1.1 0.7 1.1 0.9 1.3 2 ...
 $ Life Exp  : num  69 69.3 70.5 70.7 71.7 ...
 $ HS Grad   : num  41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 52.6 40.6 ...
 $ Frost     : num  20 152 15 65 20 166 139 103 11 60 ...
 $ Area      : num  50708 566432 113417 51945 156361 ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM