简体   繁体   中英

How to sort contents in a file in python

I'm trying to figure out a simple way to sort words from a file, however the spaces "\\n" are always returned when I print the words. How could I improve this code to make it work properly? I'm using python 2.7 Thanks in advance.

def sorting(self):
    filename = ("food.txt")
    file_handle = open(filename, "r")
    for word in file_handle:
        word = word.split()
        print sorted(file_handle)
    file_handle.close()

Basically all you have to do is strip that newline (and all other whitespace because you probably don't want it):

def sorting(self):
    filename = ("food.txt")
    file_handle = open(filename, "r")
    for line in file_handle:
        word = line.strip().split()
        print sorted(file_handle)
    file_handle.close()

Otherwise you can just remove the last character with line[:-1].split()

You actually have two problems here.


The big one is that print sorted(file_handle) reads and sorts the whole rest of the file and prints that out. You're doing that once per line. So, what happens is that you read the first line, split it, ignore the result, sort and print all the lines after the first, and then you're done.

What you want to do is accumulate all the words as you go along, then sort and print that. Like this:

def sorting(self):
    filename = ("food.txt")
    file_handle = open(filename, "r")
    words = []
    for line in file_handle:
        words += line.split()
    file_handle.close()
    print sorted(words)

Or, if you want to print the sorted list one line at a time, instead of as a giant list, change the last line to:

print '\n'.sorted(words)

For the second, more minor problem, the one you asked about, you just need to strip off the newlines. So, change the words += line to this:

words += line.strip().split()

However, if you had solved the first problem, you wouldn't even have noticed this one. If you have a line like "one two three\\n" , and you call split() on it, you will get back ["one", "two", "three"] , with no \\n to worry about. So, you don't actually even need to solve this one.


While we're at it, there are a few other improvements you could make here:

  • Use a with statement to close the file instead of doing it manually.
  • Make this function return the list of words (so you can do various different things with it, instead of just printing it and returning nothing).
  • Take the filename as a parameter instead of hardcoding it (for similar flexibility).
  • Maybe turn the loop into a comprehension—but that would require an extra "flattening" step, so I'm not sure it's worth it.
  • If you don't want duplicate words, use a set rather than a list .
  • Depending on the use case, you often want to use rstrip() or rstrip('\\n') to remove just the trailing newline, while leaving, say, paragraph indentation tabs or spaces. If you're looking for individual words, however, you probably don't want that.
  • You might want to filter out and/or split on non-alphabetical characters, so you don't get "that." as a word. Doing even this basic kind of natural-language processing is non-trivial, so I won't show an example here. (For example, you probably want "John's" to be a word, you may or may not want "jack-o-lantern" to be one word instead of three; you almost certainly don't want "two-three" to be one word…)
  • The self parameter is only needed in methods of classes. This doesn't appear to be in any class. (If it is, it's not doing anything with self , so there's no visible reason for it to be in a class. You might have some reason which would be visible in your larger program, of course.)

So, anyway:

def sorting(filename):
    words = []
    with open(filename) as file_handle:
        for line in file_handle:
            words += line.split()
    return sorted(words)

print '\n'.join(sorting('food.txt'))

Use .strip(). It will remove white space by default. You can also add other characters (like "\\n") to strip as well. This will leave just the words.

Try this:

def sorting(self):
    words = []
    with open("food.txt") as f:
        for line in f:
            words.extend(line.split())
    return sorted(words, key=lambda word: word.lower())

To avoid printing the new lines just put , in the end:

print sorted(file_handle),

In your code, i don't see that you are sorting the whole file, just the line. Use a list to save all the words, and after you read the file, sort them all.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM