简体   繁体   English

在大型数据集上使用Dcast函数(reshape2)

[英]Using the Dcast function (reshape2) on large dataset

I have a dataframe that has dimensions of (325,928 x 2). 我有一个尺寸为(325,928 x 2)的数据框。

Below is a very small subset of that data: 以下是该数据的一小部分:

Destination = c('A60001', 'A60001','A60001','A60001','A60001','A60001','A60001','A60001',
            'A60001','A60001','A60001','A60001','A60001','A60001','A60001','A60001',
            'A60001','A60001','A60001','A60001','A60001','A60001','A60001','A60001',
            'A60001', 'A60002', 'A60002','A60002','A60002','A60003')
Source = c('AA53', 'AA582', 'AA18', 'AA388', 'AA841', 'AA72', 'AA19', 'AA77', 'AA78', 'AA20', 'AA21',
       'AA12', 'AA412', 'AA634', 'AA591', 'AA859', 'AA157', 'AA254', 'AA167', 'AA176',
       'AA428', 'AA538', 'AA268', 'AA196', 'AA1250', 'AA23', 'AA16', 'AA692', 'AA196',
       'AA22')

df = data.frame(Destination, Source)

> df
   Destination Source
1       A60001   AA53
2       A60001  AA582
3       A60001   AA18
4       A60001  AA388
5       A60001  AA841
6       A60001   AA72
7       A60001   AA19
8       A60001   AA77
9       A60001   AA78
10      A60001   AA20
11      A60001   AA21
12      A60001   AA12
13      A60001  AA412
14      A60001  AA634
15      A60001  AA591
16      A60001  AA859
17      A60001  AA157
18      A60001  AA254
19      A60001  AA167
20      A60001  AA176
21      A60001  AA428
22      A60001  AA538
23      A60001  AA268
24      A60001  AA196
25      A60001 AA1250
26      A60002   AA23
27      A60002   AA16
28      A60002  AA692
29      A60002  AA196
30      A60003   AA22

Ultimate goal here is to transform this dataframe into a new dataframe using something similar to dcast because dcast cannot handle large amounts of data. 这里的最终目标是使用类似于dcast的方法将此数据帧转换为新的数据帧,因为dcast无法处理大量数据。

So here was the original code that I tried with this dataframe: 因此,这是我尝试使用此数据框的原始代码:

test<-dcast(cbind(df,V1 = rep(1,nrow(df))),`Source` ~ Destination,value.var='V1',fun.aggregate = length)

Output: 输出:

   Source A60001 A60002 A60003
1    AA12      1      0      0
2  AA1250      1      0      0
3   AA157      1      0      0
4    AA16      0      1      0
5   AA167      1      0      0
6   AA176      1      0      0
7    AA18      1      0      0
8    AA19      1      0      0
9   AA196      1      1      0
10   AA20      1      0      0
11   AA21      1      0      0
12   AA22      0      0      1
13   AA23      0      1      0
14  AA254      1      0      0
15  AA268      1      0      0
16  AA388      1      0      0
17  AA412      1      0      0
18  AA428      1      0      0
19   AA53      1      0      0
20  AA538      1      0      0
21  AA582      1      0      0
22  AA591      1      0      0
23  AA634      1      0      0
24  AA692      0      1      0
25   AA72      1      0      0
26   AA77      1      0      0
27   AA78      1      0      0
28  AA841      1      0      0
29  AA859      1      0      0

It works with the dataset I am providing but when I test it out with the full dataset of dimensions: 325,928 x 2 , R crashes. 它可以与我提供的数据集一起使用,但是当我使用尺寸为325,928 x 2的完整数据集进行测试时,R崩溃。 Is there a better function that can produce the same output but handle larger amounts of data. 是否有更好的功能可以产生相同的输出但可以处理大量数据。 If this isn't enough information, I can provide the full dataset privately to whoever thinks they can solve this ( i can't provide it here because StackOverflow can't read all the data) so you can test out the issue directly from the source. 如果这还不够,我可以向认为自己可以解决此问题的任何人私下提供完整的数据集(由于StackOverflow无法读取所有数据,我无法在此处提供),因此您可以直接从资源。

Any help would be great, thanks! 任何帮助将是巨大的,谢谢!

Thanks to @Imo suggestion, this is the new solution to solving this: 感谢@Imo建议,这是解决此问题的新解决方案:

If your dataset is very large/wide, convert your dataframe to a data.table and then from there 如果数据集非常大/宽,请将数据框转换为data.table,然后从那里

library(data.table)
df1<-setDT(df)
new3$value<-1
trial<-dcast(new3, Source ~ Destination, fill = 0)

This will give you the same result and can handle large amounts of data 这将为您提供相同的结果,并且可以处理大量数据

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM