简体   繁体   English

通过python快速按字母顺序排列大文件

[英]Quickly alphabetize a large file via python

#!/usr/bin/python

import random
import string

appendToFile = open("appendedFile", "a" )

# Generator

for i in range(1, 100000):

    chars = "".join( [random.choice(string.letters) for i in xrange(15)] )
    chars2 = "".join( [random.choice(string.letters) for i in xrange(15)] )

    appendToFile.write(chars + ":" + chars2 + "\n")

appendToFile.close()

Code modified from this question . 这个问题修改的代码。

The above code generates 100,000 lines of random text in the format of STRING:STRING. 上面的代码以STRING:STRING的格式生成100,000行随机文本。 Resultant text file is 3.1 MB. 结果文本文件是3.1 MB。

How would one rapidly alphabetise the file, using the first STRING in STRING:STRING? 如何使用STRING中的第一个STRING快速将文件字母化:STRING? Case is irrelevant. 案件无关紧要。

Bubble sort is very slow, no? 冒泡排序很慢,不是吗?

The obvious first approach is simply to use the built-in sort feature in Python. 显而易见的第一种方法是使用Python中的内置排序功能。 Is this not what you had in mind? 这不是你想到的吗? If not, why? 如果没有,为什么? With only 100,000 lines of random text, the built-in sort would be very fast. 只有100,000行随机文本,内置排序将非常快。

lst = open("appendedFile", "rt").readlines()
lst.sort(key=str.lower)

Done. 完成。 We could do it as a one-liner if you really wanted to: 如果你真的想要,我们可以做一个单行班:

lst = sorted(open("appendedFile", "rt").readlines(), key=str.lower)

EDIT: I just checked, and strings.letters includes both upper-case and lower-case letters. 编辑:我刚检查过, strings.letters包括大写和小写字母。 So, above code is modified to be case-insensitive. 因此,上面的代码被修改为不区分大小写。

EDIT: more on sorting in Python: http://wiki.python.org/moin/HowTo/Sorting 编辑:更多关于Python的排序: http//wiki.python.org/moin/HowTo/Sorting

This is very fast (under 1 second on my computer). 这非常快(在我的电脑上不到1秒)。 It uses a case-insensitive sort, which is assume what you mean by "case is irrelevant"? 它使用不区分大小写的排序,假设你的意思是“case is irrelevant”?

#!/usr/bin/python

appendToFile = open("appendedFile", "r")
sortToFile = open("sortedFile", "w")

for line in sorted(appendToFile, key = str.lower):
    sortToFile.write(line)

Try this (case insensitive): 试试这个(不区分大小写):

l=file(appendedFile).readlines()
l.sort(key=lambda x:x.lower())

For these kinds of sizes optimalisation is not really necessary (timings on my slow machine ;-): 对于这些尺寸的优化并不是必要的(我慢机上的时间;-):

christophe@orion:~$ time python -c "l=file('appendedFile').readlines();l.sort(key=lambda x:x.lower())"

real    0m0.615s
user    0m0.576s
sys 0m0.024s

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM