简体   繁体   中英

Finding the longest word in a .txt file without punctuation marks

I am doing Python file I/O exercises and albeit made a huge progress on an exercise in which I try to find the longest words in each line of a .txt file, I can't get rid of the punctuation marks .

Here is the code I have:

with open("original-3.txt", 'r') as file1:
lines = file1.readlines()
for line in lines:
    if not line == "\n":
        print(max(line.split(), key=len))

This is the output I get

This is the original-3.txt file where I am reading the data from

'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe.

"Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
The frumious Bandersnatch!"

He took his vorpal sword in hand:
Long time the manxome foe he sought,
So rested he by the Tumtum tree,
And stood a while in thought.

And, as in uffish thought he stood,
The Jabberwock, with eyes of flame,
Came whiffling through the tulgey wood,
And burbled as it came!

One two! One two! And through and through
The vorpal blade went snicker-snack!
He left it dead, and with its head
He went galumphing back.

"And hast thou slain the Jabberwock?
Come to my arms, my beamish boy!"
"Oh frabjous day! Callooh! Callay!"
He chortled in his joy.

'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.

As you can see, I am getting the punctuation marks like ["," ";" "?" "!"]["," ";" "?" "!"]

How do you think I can only get the words themselves?

Thank you

Using Regex it is very easy to get what is the length of longest word :

import re

for line in lines:
    found_strings = re.findall(r'\w+', line)
    print(max([len(txt) for txt in found_strings]))

You have to strip those characters from the words:

with open("original-3.txt", 'r') as file1:
    lines = file1.readlines()
for line in lines:
    if not line == "\n":
        print(max(word.strip(",?;!\"") for word in line.split()), key=len))

or you use regular expressions to extract everything that looks like a word (ie consists of letters):

import re


for line in lines: 
    words = re.findall(r"\w+", line) 
    if words: 
        print(max(words, key=len)) 

This solution does not use regular expressions. It splits the line into words, and then sanitizes each word so that it only contains alphabetical characters.

with open("original-3.txt", 'r') as file1:
    lines = file1.readlines()
    for line in lines:
        if not line == "\n":
            words = line.split()
            for i, word in enumerate(words):
                words[i] = "".join([letter for letter in word if letter.isalpha()])
            print(max(words, key=len))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM