简体   繁体   English

使用C / C ++中的有限主内存编辑10gb文件

[英]Editing a 10gb file using limited main memory in C/C++

I need to sort a 10gb file containing a list of numbers as fast as possible using only 100mb of memory. 我需要使用仅100mb的内存尽快对包含数字列表的10gb文件进行排序。 I'm breaking them into chunks and then merging them. 我把它们分成几块然后合并它们。

I am currently using C File pointers as they go faster than c++ file i/o(atleast on my system). 我目前正在使用C文件指针,因为它们比c ++文件i / o(至少在我的系统上)更快。

I tried for a 1gb file and my code works fine, but it throws a segmentation fault as soon as I fscanf after opening the 10gb file. 我尝试了1gb文件并且我的代码工作正常,但是在打开10gb文件后,只要我fscanf就会引发分段错误。

FILE *fin;
FILE *fout;
fin = fopen( filename, "r" );
while( 1 ) {
    // throws the error here
    for( i = 0; i < MAX && ( fscanf( fin, "%d", &temp ) != EOF ); i++ ) {
        v[i] = temp;
    }

What should I use instead? 我应该用什么呢?

And do you have any suggestions about how to go about this in the best way possible? 对于如何以最佳方式解决这个问题,您有什么建议吗?

There is a special class of algorithms for this called external sorting . 这种称为外部排序的算法有一类。 There is a variant of merge sort that is an external sorting algorithm (just google for merge sort tape ). 合并排序的变体是外部排序算法(只是google用于合并排序磁带 )。

But if you're on Unix, it's probably easier to run the sort command in a separate process. 但是如果你在Unix上,在单独的进程中运行sort命令可能更容易。

BTW. BTW。 Opening files that are bigger than 2 GB requires large file support. 打开大于2 GB的文件需要大量文件支持。 Depending on your operating system and your libraries, you need to define a macro or call other file handling functions. 根据您的操作系统和库,您需要定义宏或调用其他文件处理函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM