Converting a Text File to a String in Python

Question

I am new to python and am trying to find the largest word in the alice_in_worderland.txt. I think I have a good system set up ("See Below"), but my output is returning a "word" with dashes connecting multiple words. Is there someway to remove the dashes in the input of the file? For the text file visit here

sample from text file:

That's very important,' the King said, turning to the jury. They were just beginning to write this down on their slates, when the White Rabbit interrupted: UNimportant, your Majesty means, of course,' he said in a very respectful tone, but frowning and making faces at him as he spoke. " UNimportant, of course, I meant,' the King hastily said, and went on to himself in an undertone, important--unimportant-- unimportant--important--' as if he were trying which word sounded best."

code:

    #String input
    with open("alice_in_wonderland.txt", "r") as myfile:
        string=myfile.read().replace('\n','')
    #initialize list
    my_list = []
    #Split words into list
    for word in string.split(' '):
        my_list.append(word)
    #initialize list
    uniqueWords = []
    #Fill in new list with unique words to shorten final printout
    for i in my_list:
        if not i in uniqueWords:
            uniqueWords.append(i)
    #Legnth of longest word
    count = 0
    #Longest word place holder
    longest = []
    for word in uniqueWords:
        if len(word)>count:
            longest = word
            count = len(longest)
        print longest

Answer 1

>>> import nltk # pip install nltk
>>> nltk.download('gutenberg')
>>> words = nltk.corpus.gutenberg.words('carroll-alice.txt')
>>> max(words, key=len) # find the longest word
'disappointment'

Answer 2

Here's one way using re and mmap :

import re
import mmap

with open('your alice in wonderland file') as fin:
    mf = mmap.mmap(fin.fileno(), 0, access=mmap.ACCESS_READ)
    words = re.finditer('\w+', mf)
    print max((word.group() for word in words), key=len)

# disappointment

Far more efficient than loading the file to physical memory.

Answer 3

Use str.replace to replace the dashes with spaces (or whatever you want). To do this, simply add another call to replace after the first call on line 3:

string=myfile.read().replace('\n','').replace('-', ' ')

Converting a Text File to a String in Python

Question

3 answers

solution1
3 ACCPTED 2014-08-16 23:40:40

solution2
2 2014-08-16 23:22:10

solution3
0 2014-08-16 23:04:09

Converting a Text File to a String in Python

Question

3 answers

solution1 3 ACCPTED 2014-08-16 23:40:40

solution2 2 2014-08-16 23:22:10

solution3 0 2014-08-16 23:04:09

solution1
3 ACCPTED 2014-08-16 23:40:40

solution2
2 2014-08-16 23:22:10

solution3
0 2014-08-16 23:04:09