[英]Creating smaller chunks from large file and sort the chunks
I am implementing external sort in python, and currently stuck with this problem. 我正在python中实现外部排序,并且目前仍然遇到此问题。 I have divided a large text file containing integer numbers into small chunks and I am trying to sort these chunks.
我已将包含整数的大文本文件划分为小块,并尝试对这些块进行排序。 So far I am able to write this much.
到目前为止,我已经可以写这么多了。
with open(fpath,'rb') as fin:
input_iter = iter(lambda: fin.read(40 * 1024),'')
for item in input_iter:
print item
current_chunk = list(item)
# sort the buffers
current_chunk.sort(key = lambda x : int(x))
When I execute this code, I got an error 执行此代码时,出现错误
File "problem3.py", line 68, in <lambda>
current_chunk.sort(key = lambda x : int(x))
ValueError: invalid literal for int() with base 10: ''
which I guess is coming due to this line input_iter = iter(lambda: fin.read(40 * 1024),'')
Is their an alternate way to over come this problem. 我猜这是由于这一行来了
input_iter = iter(lambda: fin.read(40 * 1024),'')
是他们解决这个问题的另一种方法。 Thank you 谢谢
You have whitespace in your input: 您的输入中包含空格:
>>> int(' ')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: ''
>>> int('\n')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: ''
>>> int('\t')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: ''
Whitespace is stripped when converting to int
, hence the confusing error message; 转换为
int
,空格被剥离,因此产生了令人困惑的错误消息。 note how there is nothing between the quotes in the exception message (Python 3 has fixed this). 请注意,异常消息中的引号之间没有任何区别(Python 3已修复此问题)。
Strip spaces: 删除空格:
current_chunk = filter(None, map(str.strip, item))
or avoid turning them to integers: 或避免将它们转换为整数:
current_chunk.sort(key=lambda x: int(x) if x.strip() else x)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.