简体   繁体   English

Java中对大量字符串进行排序的高效且可扩展的方法

[英]Efficient and scalable way to sort large amount of strings in Java

I am looking for some ideas idea on sorting large amount of strings from an input file and print out the sorted results to a new file in Java. 我正在寻找有关对来自输入文件的大量字符串进行排序并将打印结果打印到Java中的新文件的一些想法。 The requirement is that the input file could be extremely large. 要求是输入文件可能非常大。 I need to consider the performance in the solution, so any ideas? 我需要考虑解决方案中的性能,所以有什么想法吗?

External Sorting technique is generally used to sort huge amounts of data. 外部排序技术通常用于对大量数据进行排序。 May be this is what you need. 可能这就是您所需要的。

externalsortinginjava is the java library for this. externalsortinginjava是用于此的java库。

Is an SQL database available? 有SQL数据库可用吗? If you inserted all the data into a table, with the sortable column or section indexed, you may (or may not) be able to output the sorted result more efficiently. 如果将所有数据插入可排序的列或节已建立索引的表中,则可能(或可能无法)更有效地输出排序结果。 This solution may also be helpful if the amount of data, outweighs the amount of RAM available. 如果数据量超过可用RAM量,则此解决方案也可能会有所帮助。

It would be interesting to know how large, and what the purpose is. 知道大小和目的是很有趣的。

Break the file into amounts you can read in memory. 将文件分成可以在内存中读取的数量。 Sort each amount and write to a file. 对每个金额进行排序并写入文件。 (If you could fit everything into memory you are done) Merge sort the resulting files into a single sorted file. (如果您可以将所有内容都放入内存中,则可以完成操作)将结果文件合并到一个单独的文件中。

You can also do a form of radix sort to improve CPU efficiency, but the main bottleneck is all the re-writing and re-reading you have to do. 您还可以进行某种基数排序以提高CPU效率,但是主要的瓶颈是必须要做的所有重写和重新读取。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM