简体   繁体   English

将结果集写入具有排序输出的文件

[英]Writing resultset to file with sorted output

I want to put "random" output from my result set (about 1.5 mil rows) in a file in a sorted manner. 我想以排序的方式将结果集中的“随机”输出(约150万行)放入文件中。 I know i can use sort by command in my query but that command is "expensive". 我知道我可以在查询中使用按命令排序,但是该命令“很昂贵”。 Can you tell me is there any algorithm for writing result set rows in a file so the content would be sorted in the end and can i gain in performance with this? 您能告诉我是否有用于在文件中写入结果集行的算法,因此内容将最终排序,我是否可以以此提高性能? I'm using java 1.6, and query has multiple joins. 我正在使用Java 1.6,并且查询具有多个联接。

Define an index for the sort criteria in your table, then you can use the order by clause without problems and write the file as it comes from the resultset. 为表中的排序条件定义索引,然后可以使用order by子句而不会出现问题,并从结果集中写入文件。

If your query has multiple joins, create the proper indexes for the joins and for the sort criteria. 如果您的查询具有多个联接,请为联接和排序条件创建正确的索引。 You can sort the data on your program but you'd be wasting time. 您可以对程序中的数据进行排序,但这样会浪费时间。 That time will be a lot more valuable when employed learning how to properly tune/use your database rather than reinventing sorting algorithms already present in the database engine. 当他们学习如何正确地调优/使用数据库而不是重新发明数据库引擎中已经存在的排序算法时,这段时间将变得宝贵得多。

Grab your database's profiler and check the query's execution plan. 抓住数据库的探查器,然后检查查询的执行计划。

根据我的经验,在数据库端进行排序通常是一样快或更快。

If you're reading from a database, getting sorted output shouldn't be so 'expensive' if you have appropriate indexes. 如果您正在从数据库中读取数据,那么如果您有适当的索引,那么获得排序的输出就不会那么“昂贵”。

But, sometimes with complex queries it's very hard for the SQL optimiser to apply indexes. 但是,有时对于复杂的查询,SQL优化器很难应用索引。 In that case, the DB simply accumulates the results in a temporary table and sorts it for you, transparently. 在这种情况下,数据库仅将结果存储在临时表中,并为您透明地对其进行排序。

It's very unlikely that you could match the level of optimisations put into your DB engine; 您极不可能匹配数据库引擎中的优化级别。 but if your problem arises because you're doing some postprocessing of the data that negates any sorting done by the DB, then you have no alternative other than sorting it yourself. 但是,如果由于对数据进行某些后处理而使数据库无法进行任何排序而引起问题,那么除了自己对数据进行排序之外,您别无选择。

Again, the easiest would be to use the DB: simply write to a temporary table with an appropriate index and dump from there. 同样,最简单的方法是使用DB:只需写入具有适当索引的临时表并从那里转储即可。

If you're certain that the data will always fit in RAM, you can sort it in memory. 如果确定数据将始终适合RAM,则可以在内存中对其进行排序。 It's the only case in which you might be able to beat the DB engine, just because you know you won't need HD access. 这下,您可能能够击败数据库引擎,只是因为你知道你不需要HD访问的唯一情况。

But that's a lot of 'ifs'. 但这有很多“如果”。 Better stay with your DB 最好留在您的数据库中

If you need the data sorted, someone has to do it - either you or the database. 如果您需要对数据进行排序,则必须由您或数据库来完成。 It's certainly easier effort-wise to add the ORDER BY to the query. 将ORDER BY添加到查询中肯定会更轻松省力。 But there's no reason you can't sort it in-memory on your side. 但是没有理由您无法在内存中对它进行排序。 The easiest way is to chunk the data in a sorted collection (TreeSet, TreeMap) using a Comparator to sort on the column you need. 最简单的方法是使用Comparator对所需集合中的数据进行分块(TreeSet,TreeMap)。 Then write out the sorted data. 然后写出排序的数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM