简体   繁体   English

我如何在 python 中读取、附加和排序文本文件的所有单词?

[英]How do i read, append and sort all the words of a text file in python?

Open the file romeo.txt and read it line by line.打开文件romeo.txtromeo.txt阅读。 For each line, split the line into a list of words using the split() function.对于每一行,使用split()函数将该行拆分为单词列表。 The program should build a list of words.该程序应该建立一个单词列表。 For each word on each line check to see if the word is already in the list and if not append it to the list.对于每行上的每个单词,检查该单词是否已经在列表中,如果没有,则将其附加到列表中。 When the program completes, sort and print the resulting words in alphabetical order.程序完成后,按字母顺序对结果单词进行排序和打印。

http://www.pythonlearn.com/code/romeo.txt http://www.pythonlearn.com/code/romeo.txt

Here's my code :这是我的代码:

fname = raw_input("Enter file name: ")
fh = open(fname)
for line in fh:
     for word in line.split():
          if word in line.split():
               line.split().append(word)
          if word not in line.split():
               continue
          print word

It only returns the last word of the last line, for some reason.由于某种原因,它只返回最后一行的最后一个单词。

At the top of your loop, add a list to which you'll collect your words.在循环的顶部,添加一个列表,您将在其中收集单词。 Right now you are just discarding everything.现在你只是在丢弃一切。

Your logic is also reverse, you are discarding words that you should be saving.您的逻辑也是相反的,您正在丢弃应该保存的单词。

words = []
fname = raw_input("Enter file name: ")
fh = open(fname)
for line in fh:
     for word in line.split():
          if word not in words:
               words.append(word)
fh.close()

# Now you should sort the words list and continue with your assignment

Try the following, it uses a set() to build a unique list of words.尝试以下操作,它使用set()来构建唯一的单词列表。 Each word is also lower-cased so that "The" and "the" are treated the same.每个单词也是小写的,因此“The”和“the”的处理方式相同。

import re

word_set = set()
re_nonalpha = re.compile('[^a-zA-Z ]+')

fname = raw_input("Enter file name: ")

with open(fname, "r") as f_input:
    for line in f_input:
        line = re_nonalpha.sub(' ', line)  # Convert all non a-z to spaces

        for word in line.split():
            word_set.add(word.lower())

word_list = list(word_set)
word_list.sort()
print word_list

This will display the following list:这将显示以下列表:

['already', 'and', 'arise', 'bits', 'breaks', 'but', 'east', 'envious', 'fair', 'grief', 'has', 'is', 'it', 'juliet', 'kill', 'light', 'many', 'moon', 'pale', 'punctation', 'sick', 'soft', 'sun', 'the', 'this', 'through', 'too', 'way', 'what', 'who', 'window', 'with', 'yonder']
sorted(set([w for l in open(fname) for w in l.split()])) 

I think you misunderstand what line.split() is doing.我想你误解了line.split()在做什么。 line.split() will return a list containing the "words" that are in the string line . line.split()将返回一个包含字符串line中的“单词”的列表。 Here we interpret a "word" as "substring delimited by the space character".这里我们将“单词”解释为“由空格字符分隔的子串”。 So if line was equal to "Hello, World. I <3 Python" , line.split() would return the list ["Hello,", "World.", "I", "<3", "Python"] .因此,如果line等于"Hello, World. I <3 Python"line.split()将返回列表["Hello,", "World.", "I", "<3", "Python"] .

When you write for word in line.split() you are iterating through each element of that list.当您for word in line.split()编写时for word in line.split()您正在遍历该列表的每个元素。 So the condition word in line.split() will always be true!所以word in line.split()的条件word in line.split()将永远为真! What you really want is a cumulative list of "words you have already come across".您真正想要的是“您已经遇到过的单词”的累积列表。 At the top of the program you would create it using DiscoveredWords = [] .在程序的顶部,您将使用DiscoveredWords = []创建它。 Then for every word in every line you would check然后对于每一行中的每个单词,你都会检查

if word not in DiscoveredWords:
    DiscoveredWords.append(word)

Got it?知道了? :) Now since it seems you are new to Python (welcome to the fun by the way) here is how I would have written the code: :) 现在,由于您似乎是 Python 新手(顺便说一下,欢迎来到这里的乐趣)这里是我编写代码的方式:

fname = raw_input("Enter file name: ")
with open(fname) as fh:
    words = [word for line in fh for word in line.strip().split()]
words = list(set(words))
words.sort()

Let's do a quick overview of this code so that you can understand what is going on:让我们快速浏览一下这段代码,以便您了解发生了什么:

with open(fname) as fh is a handy trick to remember. with open(fname) as fh是一个方便记住的技巧。 It allows you to ensure that your file gets closed!它允许您确保您的文件被关闭! Once python exits the with block it will close the file for you automatically :D一旦 python 退出with块,它会自动为你关闭文件:D

words = [word for line in fh for word in line.strip().split()] is another handy trick. words = [word for line in fh for word in line.strip().split()]是另一个方便的技巧。 This is one of the more concise ways to get a list containing all of the words in a file!这是获取包含文件中所有单词的列表的更简洁的方法之一! We are telling python to make a list by taking every line in the file ( for line in fh ) and then every word in that line ( for word in line.strip().split() ).我们告诉 python 通过获取文件中的每一行( for line in fh )和该行中的每个单词( for word in line.strip().split()单词)来创建一个列表。

words = list(set(words)) casts our list to a set and then back to a list . words = list(set(words))将我们的 list 转换为set ,然后再转换回list This is a quick way to remove duplicates as a set in python contains unique elements.这是一种删除重复项的快速方法,因为 Python 中的set包含唯一元素。

Finally we sort the list using words.sort() .最后,我们使用words.sort()对列表进行排序。

Hope this was helpful and instructive :)希望这是有帮助和有指导意义的:)

words=list()
fname = input("Enter file name: ")
fh = open(fname).read()
fh=fh.split()

for word in fh:
    if word in words:
        continue
    else:
        words.append(word)
words.sort()
print(words)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM