简体   繁体   English

Python-无法将txt文件中的行拆分为单词

[英]Python - Unable to split lines from a txt file into words

My goal is to open a file and split it into unique words and display that list (along with a number count). 我的目标是打开一个文件并将其拆分为唯一的单词并显示该列表(以及数字计数)。 I think I have to split the file into lines and then split those lines into words and add it all into a list. 我认为我必须将文件拆分为几行,然后将这些行拆分为单词,并将其全部添加到列表中。

The problem is that if my program will run in an infinite loop and not display any results, or it will only read a single line and then stop. 问题是,如果我的程序将在无限循环中运行并且不显示任何结果,或者它只会读取一行,然后停止。 The file being read is The Gettysburg Address. 正在读取的文件是葛底斯堡地址。

def uniquify( splitz, uniqueWords, lineNum ):
for word in splitz:
    word = word.lower()        
    if word not in uniqueWords:
        uniqueWords.append( word )

def conjunctionFunction():

    uniqueWords = []

    with open(r'C:\Users\Alex\Desktop\Address.txt') as f :
        getty = [line.rstrip('\n') for line in f]
    lineNum = 0
    lines = getty[lineNum]
    getty.append("\n")
    while lineNum < 20 :
        splitz = lines.split()
        lineNum += 1

        uniquify( splitz, uniqueWords, lineNum )
    print( uniqueWords )


conjunctionFunction()

Using your current code, the line: 使用您当前的代码,该行:

lines = getty[lineNum]

should be moved within the while loop. 应该在while循环内移动。

You figured out what's wrong with your code, but nonetheless, I would do this slightly differently. 您发现了代码的问题所在,但是尽管如此,我还是会略有不同。 Since you need to keep track of the number of unique words and their counts, you should use a dictionary for this task: 由于您需要跟踪唯一单词的数量及其数量,因此您应该使用字典来完成此任务:

wordHash = {}

with open('C:\Users\Alex\Desktop\Address.txt', 'r') as f :
    for line in f:
       line = line.rstrip().lower()

       for word in line:
            if word not in wordHash:
                wordHash[word] = 1

            else: 
                wordHash[word] += 1

print wordHash
def splitData(filename):
    return [words for words in open(filename).reads().split()]

Easiest way to split a file into words :) 将文件拆分为单词的最简单方法:)

Assume inp is retrived from a file 假设inp从文件retrived

inp = """Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense."""


data = inp.splitlines()

print data

_d = {}

for line in data:
    word_lst = line.split()
    for word in word_lst:
        if word in _d:
            _d[word] += 1
        else:
            _d[word] = 1

print _d.keys()

Output 产量

['Beautiful', 'Flat', 'Simple', 'is', 'dense.', 'Explicit', 'better', 'nested.', 'Complex', 'ugly.', 'Sparse', 'implicit.', 'complex.', 'than', 'complicated.']

I recommend: 我建议:

#!/usr/local/cpython-3.3/bin/python

import pprint
import collections

def genwords(file_):
    for line in file_:
        for word in line.split():
            yield word

def main():
    with open('gettysburg.txt', 'r') as file_:
        result = collections.Counter(genwords(file_))

    pprint.pprint(result)

main()

...but you could use re.findall to deal with punctuation better, instead of string.split. ...但是您可以使用re.findall更好地处理标点符号,而不是使用string.split。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM