在Python中将文本文件转换为字符串

Question

我是python的新手，正在尝试在alice_in_worderland.txt中找到最大的单词。 我认为我已经建立了一个好的系统（“请参阅下文”），但是我的输出返回一个带有多个连接点的破折号的“单词”。 是否有办法删除文件输入中的破折号？ 对于文本文件，请访问此处

来自文本文件的示例：

这非常重要，”国王说，转向陪审团。 当白兔打断他们的时候，他们才开始在石板上写下这句话：不重要，Ma下当然是，”他以非常恭敬的语调说，但他皱着眉头，在他说话时做鬼脸。 “当然，我的意思是不重要的，”国王急忙说，然后以低调，重要-不重要-不重要-重要-继续对自己说，好像他正在尝试哪个词听起来最好。

码：

    #String input
    with open("alice_in_wonderland.txt", "r") as myfile:
        string=myfile.read().replace('\n','')
    #initialize list
    my_list = []
    #Split words into list
    for word in string.split(' '):
        my_list.append(word)
    #initialize list
    uniqueWords = []
    #Fill in new list with unique words to shorten final printout
    for i in my_list:
        if not i in uniqueWords:
            uniqueWords.append(i)
    #Legnth of longest word
    count = 0
    #Longest word place holder
    longest = []
    for word in uniqueWords:
        if len(word)>count:
            longest = word
            count = len(longest)
        print longest

Answer 1

>>> import nltk # pip install nltk
>>> nltk.download('gutenberg')
>>> words = nltk.corpus.gutenberg.words('carroll-alice.txt')
>>> max(words, key=len) # find the longest word
'disappointment'

Answer 2

这是使用re和mmap的一种方法：

import re
import mmap

with open('your alice in wonderland file') as fin:
    mf = mmap.mmap(fin.fileno(), 0, access=mmap.ACCESS_READ)
    words = re.finditer('\w+', mf)
    print max((word.group() for word in words), key=len)

# disappointment

比将文件加载到物理内存要有效得多。

Answer 3

使用str.replace将破折号替换为空格（或任何您想要的）。 为此，只需在第3行的第一个调用之后添加另一个调用来替换：

string=myfile.read().replace('\n','').replace('-', ' ')

在Python中将文本文件转换为字符串

问题描述

3 个解决方案

解决方案1
3 已采纳 2014-08-16 23:40:40

解决方案2
2 2014-08-16 23:22:10

解决方案3
0 2014-08-16 23:04:09

在Python中将文本文件转换为字符串

问题描述

3 个解决方案

解决方案1 3 已采纳 2014-08-16 23:40:40

解决方案2 2 2014-08-16 23:22:10

解决方案3 0 2014-08-16 23:04:09

解决方案1
3 已采纳 2014-08-16 23:40:40

解决方案2
2 2014-08-16 23:22:10

解决方案3
0 2014-08-16 23:04:09