简体   繁体   中英

Quickly alphabetize a large file via python


import random
import string

appendToFile = open("appendedFile", "a" )

# Generator

for i in range(1, 100000):

    chars = "".join( [random.choice(string.letters) for i in xrange(15)] )
    chars2 = "".join( [random.choice(string.letters) for i in xrange(15)] )

    appendToFile.write(chars + ":" + chars2 + "\n")


Code modified from this question .

The above code generates 100,000 lines of random text in the format of STRING:STRING. Resultant text file is 3.1 MB.

How would one rapidly alphabetise the file, using the first STRING in STRING:STRING? Case is irrelevant.

Bubble sort is very slow, no?

The obvious first approach is simply to use the built-in sort feature in Python. Is this not what you had in mind? If not, why? With only 100,000 lines of random text, the built-in sort would be very fast.

lst = open("appendedFile", "rt").readlines()

Done. We could do it as a one-liner if you really wanted to:

lst = sorted(open("appendedFile", "rt").readlines(), key=str.lower)

EDIT: I just checked, and strings.letters includes both upper-case and lower-case letters. So, above code is modified to be case-insensitive.

EDIT: more on sorting in Python: http://wiki.python.org/moin/HowTo/Sorting

This is very fast (under 1 second on my computer). It uses a case-insensitive sort, which is assume what you mean by "case is irrelevant"?


appendToFile = open("appendedFile", "r")
sortToFile = open("sortedFile", "w")

for line in sorted(appendToFile, key = str.lower):

Try this (case insensitive):

l.sort(key=lambda x:x.lower())

For these kinds of sizes optimalisation is not really necessary (timings on my slow machine ;-):

christophe@orion:~$ time python -c "l=file('appendedFile').readlines();l.sort(key=lambda x:x.lower())"

real    0m0.615s
user    0m0.576s
sys 0m0.024s

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM