[英]R: How to dcast a subset of 'variable' to a separate column in data.table?
I have a big dataset in a data table
that I am trying to transform. 我要转换的data table
中有一个大数据集。 The original dataset is a crosstab with 1 to 2 levels of information in the column_names. 原始数据集是一个交叉表,在column_names中具有1到2级信息。 So I thought I need to melt
everything down, extract the relevant information, then re-cast the individual columns back. 因此,我认为我需要melt
所有内容,提取相关信息,然后将各个列重新投射回去。
This is where I am hitting a roadblock. 这是我遇到障碍的地方。
Below is a simplified mock data showing what I am trying to do: 以下是简化的模拟数据,显示了我要执行的操作:
Go from: 从...来:
ID1 ID2 X.Measure1 X.Measure2 Y.Measure1 Y.Measure2
1: 1 1 -0.6264538 0.4874291 -0.62124058 0.82122120
2: 2 2 0.1836433 0.7383247 -2.21469989 0.59390132
3: 3 1 -0.8356286 0.5757814 1.12493092 0.91897737
4: 1 2 1.5952808 -0.3053884 -0.04493361 0.78213630
5: 2 1 0.3295078 1.5117812 -0.01619026 0.07456498
6: 3 2 -0.8204684 0.3898432 0.94383621 -1.98935170
Perform 2 intermediate steps: (i) extract the integers '1' and '2' into a new column 'n'; 执行2个中间步骤:(i)将整数“ 1”和“ 2”提取到新列“ n”中; and (ii) rename 'variable' to 'Y.Measure' (shown below on the left). (ii)将“变量”重命名为“ Y.Measure”(如左下方所示)。
The final form is obtained by casting
the figures in green as shown below on the right : 最终的形式是通过获得casting
如下所示在右边的绿色附图中:
Sample Code: 样例代码:
library( data.table )
library( reshape2 )
library( stringr )
set.seed(1)
DT <- data.table( ID1 = rep( c(1:3),2 ), ID2 = rep( c(1:2),3 ),
X.Measure1 = rnorm(6), X.Measure2 = rnorm(6),
Y.Measure1 = rnorm(6), Y.Measure2 = rnorm(6)
)
Long_DT <- melt( DT, id = c( "ID1", "ID2" ) )
Long_DT[ , n := substr( Long_DT$variable, 10, 10 ) ]
Long_DT[ str_detect( Long_DT$variable, "Y.Measure." ), variable := "Y.Measure" ]
The Problem: 问题:
But when I tried dcast
with a subset
argument, I get the wrong result: 但是,当我尝试使用带有subset
参数的dcast
时,得到了错误的结果:
> dcast.data.table ( Long_DT, ID1+ID2 ~ variable, subset = (variable=="Y.Measure") )
Aggregate function missing, defaulting to 'length'
ID1 ID2 Y.Measure
1: 1 1 2
2: 1 2 2
3: 2 1 2
4: 2 2 2
5: 3 1 2
6: 3 2 2
I tried Googling for the solution but to no avail. 我尝试使用谷歌搜索解决方案,但无济于事。 I am wondering if my dcast
function is wrong or if my approach is wrong to begin with (ie there is a much easier way to achieve what I want). 我想知道我的dcast
函数是错误的还是我的方法一开始是错误的(即,有一种更简单的方法来实现我想要的功能)。
Any help would be most appreciated! 非常感激任何的帮助! Thanks for reading! 谢谢阅读!
UPDATE: 更新:
I found the error in my dcast
function above - there should have been 'n' on the LHS: 我在上面的dcast
函数中发现了错误dcast
上应该有'n'了:
dcast.data.table ( Long_DT, ID1+ID2+n ~ variable, subset = .(variable=="Y.Measure") )
The result would be: 结果将是:
> dcast.data.table ( Long_DT, ID1+ID2+n ~ variable, subset = .(variable=="Y.Measure") )
ID1 ID2 n Y.Measure
1: 1 1 1 -0.62124058
2: 1 1 2 0.82122120
3: 1 2 1 -0.04493361
4: 1 2 2 0.78213630
5: 2 1 1 -0.01619026
6: 2 1 2 0.07456498
7: 2 2 1 -2.21469989
8: 2 2 2 0.59390132
9: 3 1 1 1.12493092
10: 3 1 2 0.91897737
11: 3 2 1 0.94383621
12: 3 2 2 -1.98935170
>
Unfortunately, XMeasure1 and XMeasure2 also disappeared with the subset
, so this doesn't help my overall cause. 不幸的是,XMeasure1和XMeasure2也随subset
消失了,所以这对我的整体原因没有帮助。
Below is my modified code with akrun's suggested dcast
code: 以下是我用akrun建议的dcast
代码修改后的代码:
library( data.table )
library( reshape2 )
library( stringr )
set.seed(1)
DT <- data.table( ID1 = rep( c(1:3),2 ), ID2 = rep( c(1:2),3 ),
X.Measure1 = rnorm(6), X.Measure2 = rnorm(6),
Y.Measure1 = rnorm(6), Y.Measure2 = rnorm(6)
)
Long_DT <- melt( DT, id = c( "ID1", "ID2" ) )
Long_DT[ , n := substr( Long_DT$variable, 10, 10 ) ]
Long_DT[ str_detect( Long_DT$variable, "Y.Measure." ), variable := "Y.Measure" ]
dcast.data.table(Long_DT[, N:=1:.N, variable], ID1+ID2+N~variable, subset = (variable=="Y.Measure") )
Results: 结果:
ID1 ID2 N Y.Measure
1: 1 1 1 -0.62124058
2: 1 1 7 0.82122120
3: 1 2 4 -0.04493361
4: 1 2 10 0.78213630
5: 2 1 5 -0.01619026
6: 2 1 11 0.07456498
7: 2 2 2 -2.21469989
8: 2 2 8 0.59390132
9: 3 1 3 1.12493092
10: 3 1 9 0.91897737
11: 3 2 6 0.94383621
12: 3 2 12 -1.98935170
I'm not sure if this is what you're expecting, but I just pushed a new feature to melt.data.table
, that allows melting into multiple columns now.. 我不确定这是否是您所期望的,但我只是将一个新功能推送到了melt.data.table
,该功能现在可以分解为多列。
You can install the development version by following these instructions . 您可以按照以下说明安装开发版本。 Then you can do: 然后,您可以执行以下操作:
require(data.table) ## v1.9.5
melt(DT, id=1:2, measure=list(3:4, 5:6),
value.name = c("X.measure", "Y.measure"))
By default, the variable
column is populated with numbers. 默认情况下, variable
列中填充数字。 If that's not desirable, just change the levels of the variable column accordingly. 如果不希望如此,只需相应地更改变量列的级别即可。
HTH HTH
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.