简体   繁体   English

从大文件中创建较小的块,并对块进行排序

[英]Creating smaller chunks from large file and sort the chunks

I am implementing external sort in python, and currently stuck with this problem. 我正在python中实现外部排序,并且目前仍然遇到此问题。 I have divided a large text file containing integer numbers into small chunks and I am trying to sort these chunks. 我已将包含整数的大文本文件划分为小块,并尝试对这些块进行排序。 So far I am able to write this much. 到目前为止,我已经可以写这么多了。

with open(fpath,'rb') as fin:
    input_iter = iter(lambda: fin.read(40 * 1024),'')
    for item in input_iter:
        print item
        current_chunk = list(item)
        # sort the buffers
        current_chunk.sort(key = lambda x : int(x))

When I execute this code, I got an error 执行此代码时,出现错误

File "problem3.py", line 68, in <lambda>
current_chunk.sort(key = lambda x : int(x))
ValueError: invalid literal for int() with base 10: ''

which I guess is coming due to this line input_iter = iter(lambda: fin.read(40 * 1024),'') Is their an alternate way to over come this problem. 我猜这是由于这一行来了input_iter = iter(lambda: fin.read(40 * 1024),'')是他们解决这个问题的另一种方法。 Thank you 谢谢

You have whitespace in your input: 您的输入中包含空格:

>>> int(' ')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: ''
>>> int('\n')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: ''
>>> int('\t')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: ''

Whitespace is stripped when converting to int , hence the confusing error message; 转换为int ,空格被剥离,因此产生了令人困惑的错误消息。 note how there is nothing between the quotes in the exception message (Python 3 has fixed this). 请注意,异常消息中的引号之间没有任何区别(Python 3已修复此问题)。

Strip spaces: 删除空格:

current_chunk = filter(None, map(str.strip, item))

or avoid turning them to integers: 或避免将它们转换为整数:

current_chunk.sort(key=lambda x: int(x) if x.strip() else x)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM