[英]How do you sort a generator type with 1.72e+10 elements in python according to 2nd "column"?
I have a generator object containing 2 "columns" and 1.72e+10 "rows".我有一个包含 2 个“列”和 1.72e+10 个“行”的生成器对象。 I want to sort the generator object according to the second column, using sorted like this takes an insane amount of time.
我想根据第二列对生成器对象进行排序,像这样使用 sorted 需要大量的时间。 I don't know if it makes sense to talk about columns and rows in generators, but I use quotations marks as I don't know what else to call it.
我不知道谈论生成器中的列和行是否有意义,但我使用引号,因为我不知道还能怎么称呼它。
sorted_list = sorted(list(generator_obj),
key=lambda x: x[1],
reverse=True)
Is there any smart way of sorting a huge generator object quickly?有没有什么聪明的方法可以快速对一个巨大的生成器对象进行排序? I don't have a computer science background, so I guess I am also wondering if it is inevitable to spend a huge amount of time on sorting such a large object?
我没有计算机科学背景,所以我想我也想知道是否不可避免地要花费大量时间来对这么大的对象进行排序? Also I believe it is faster to convert the
generator_obj
before supplying it as an argument for sorted()
, which is why I don't submit a raw generator.此外,我相信在将
generator_obj
作为sorted()
的参数提供之前转换它会更快,这就是我不提交原始生成器的原因。
I have also tried to sort in place with the same results as above:我还尝试使用与上述相同的结果进行原位排序:
list(generator_obj).sort(key=lambda x: x[1], reverse=True)
Sorting requires ~N log N
time, if all you have is a comparison function.排序需要
~N log N
时间,如果你只有一个比较函数。
If your keys are integers, you can sort in ~ N
time using the radix sort .如果您的键是整数,则可以使用基数 sort在
~ N
时间内进行排序。
Either way, sorting a "huge" sequence requires "huge" time.无论哪种方式,对“巨大”序列进行排序都需要“大量”时间。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.