简体   繁体   English

R函数可以在选择多个变量时重塑数据框?

[英]R function to reshape data frame while selecting multiple variables?

I'd like to reshape a data frame from long to wide (dummy data given below). 我想从长到宽重塑数据框(下面给出了虚拟数据)。 My real data frame has many (40+) numeric variables, but I'd like to select only three of the latter when reshaping. 我的真实数据框有许多(40+)个数字变量,但是在重塑时我只想选择后者中的三个。

Options I have found so far include: 到目前为止,我发现的选项包括:

(1) reshape one variable using dcast() : (1)使用dcast()重塑一个变量:

library(reshape2)
dcast(d, group1 + group2 ~ location, value.var = "mass")

(2) reshape all the variables using reshape() : (2)使用reshape()重塑所有变量:

reshape(d, idvar = c("group1", "group2"), timevar = "location", sep = ".", 
        direction = "wide")

(3) create a vector of variables I'd like to exclude called variables.to.drop and pass this to reshape() : (3)创建一个我想排除的变量向量,称为variables.to.drop并将其传递给reshape()

variables.to.drop <- c("diameter", "volume")

reshape(d, idvar = c("group1", "group2"), timevar = "location", sep = ".", 
        direction = "wide", drop = variables.to.drop)

but I haven't found a function that takes a vector or list of variables to reshape. 但我还没有找到需要向量或变量列表进行重塑的函数。 Basically a version of dcast() that allows a list or vector to be passed to the argument value.var would fit the bill. 基本上是dcast()的版本,该版本允许将列表或向量传递到参数value.var将符合要求。 Is there a function like this that I haven't come across? 有没有我没有遇到过的这样的功能?

(I realize I can subset the data before doing the reshape operation, but it would be much cleaner to simply specify the variables in the reshaping function - since using subset() would involve having to specify all ID variables to be included, whereas IDs are recognized automatically in most reshaping functions. I could also just use the drop argument in reshape() , as above, but I want to select 3 variables out of 40+, so this is cumbersome.) (我意识到我可以在进行整形操作之前对数据进行子集化,但是在整形函数中简单地指定变量会更清洁-因为使用subset()涉及到必须指定要包含的所有ID变量,而ID是可以在大多数重塑函数中自动识别。如上所述,我也可以在reshape()使用drop参数,但是我想从40个以上的变量中选择3个变量,因此很麻烦。)

d <- structure(list(group1 = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("ripe", "unripe"), class = "factor"), 
group2 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 5L, 5L, 5L, 
5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 
4L, 4L, 4L, 4L, 4L, 4L), .Label = c("apple", "grapefruit", 
"orange", "peach", "pear"), class = "factor"), type = structure(c(2L, 
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("large", 
"small"), class = "factor"), location = structure(c(1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 
3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("P1", 
"P2", "P3"), class = "factor"), diameter = c(17.2, 19.1, 
18.5, 23.3, 22.9, 19.4, 11.1, 11.8, 6.8, 3.2, 7.9, 5.6, 8.4, 
9.2, 9.7, 17.1, 19.4, 18.9, 11.8, 10.6, 10.1, 18.8, 17.9, 
13.2, 8.5, 8.9, 7.2, 10.1, 8.7, 6.6), mass = c(11.1370341130532, 
16.2229940481484, 16.0927473288029, 16.2337944167666, 18.6091538355686, 
16.4031060528941, 10.0949575635605, 12.3255050601438, 16.6608375823125, 
15.1425114134327, 16.9359129178338, 15.4497483558953, 12.8273358359002, 
19.2343348427676, 12.9231584025547, 18.3729562815279, 12.8622328466736, 
12.6682078000158, 11.8672278965823, 12.3222591052763, 13.1661245482974, 
13.0269337072968, 11.590460028965, 10.3999591805041, 12.1879954100586, 
18.1059855245985, 15.2569754677825, 19.1465816600248, 18.3134504687041, 
10.4577026329935), volume = c(39.1218296485022, 35.3037334373221, 
36.0934440605342, 40.1461374014616, 33.6219241656363, 45.1934127090499, 
34.0249607525766, 35.1761963730678, 49.8430083505809, 46.1470468062907, 
41.0666718147695, 42.9281218815595, 36.2364861415699, 42.4363839626312, 
36.5954035148025, 40.0399494590238, 43.5418905457482, 39.6998247830197, 
34.8785765469074, 45.3091957513243, 31.4755976013839, 36.193732037209, 
44.3454348668456, 40.0909182429314, 33.0599791789427, 40.0786697631702, 
39.879218460992, 45.0240039406344, 33.4929964784533, 46.9678482087329
)), .Names = c("group1", "group2", "type", "location", "diameter", 
"mass", "volume"), row.names = c(NA, -30L), class = "data.frame")

You have resisted adding details needed to allow a specific answer (names of columns in question and example or explicit structure of desired output), but perhaps this is what you are asking to be done. 您拒绝添加允许特定答案所需的详细信息(所涉及列的名称和示例或所需输出的显式结构),但这也许就是您要完成的工作。 If you want to drop a specific set of columns from a dataframe prior to reshaping it is reasonably straightforward: 如果要在重塑之前从数据框中删除一组特定的列,这是相当简单的:

 colNamesToBeDropped <- c("colnam1","colnam2","colnam3","colnam4","colnam5")
 colsToBekept <- ! names(dfrm) %in% colNamesToBeDropped
 reshape( dfrm[ , colsToBeKept] , .... rest of parameters ...  )

#Using a dataset that is on everyone's machine 
> state.x77 <- as.data.frame(state.x77)
> str(state.x77)
'data.frame':   50 obs. of  8 variables:
 $ Population: num  3615 365 2212 2110 21198 ...
 $ Income    : num  3624 6315 4530 3378 5114 ...
 $ Illiteracy: num  2.1 1.5 1.8 1.9 1.1 0.7 1.1 0.9 1.3 2 ...
 $ Life Exp  : num  69 69.3 70.5 70.7 71.7 ...
 $ Murder    : num  15.1 11.3 7.8 10.1 10.3 6.8 3.1 6.2 10.7 13.9 ...
 $ HS Grad   : num  41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 52.6 40.6 ...
 $ Frost     : num  20 152 15 65 20 166 139 103 11 60 ...
 $ Area      : num  50708 566432 113417 51945 156361 ...
> colNamesToBeDropped <- c("Income", "Murder")
>      colsToBeKept <- ! names(state.x77) %in% colNamesToBeDropped
>      str( state.x77[ , colsToBeKept] )
'data.frame':   50 obs. of  6 variables:
 $ Population: num  3615 365 2212 2110 21198 ...
 $ Illiteracy: num  2.1 1.5 1.8 1.9 1.1 0.7 1.1 0.9 1.3 2 ...
 $ Life Exp  : num  69 69.3 70.5 70.7 71.7 ...
 $ HS Grad   : num  41.3 66.7 58.1 39.9 62.6 63.9 56 54.6 52.6 40.6 ...
 $ Frost     : num  20 152 15 65 20 166 139 103 11 60 ...
 $ Area      : num  50708 566432 113417 51945 156361 ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM