简体   繁体   中英

Finding longest word in a txt file

I am trying to create a function in which a filename is taken as a parameter and the function returns the longest word in the file with the line number attached to the front of it. This is what I have so far but it is not producing the expected output I need.

def word_finder(file_name):
    with open(file_name) as f:
        lines = f.readlines()
        line_num = 0
        longest_word = None
        for line in lines:
            line = line.strip()
            if len(line) == 0:
                return None
            else:
                line_num += 1
                tokens = line.split()
                for token in tokens:
                    if longest_word is None or len(token) > len(longest_word):
                        longest_word = token
            return (str(line_num) + ": " + str(longest_word))

I think this is the shortest way to find the word, correct if not

def wordFinder(filename):
    with open(filename, "r") as f:
        words = f.read().split() # split() returns a list with the words in the file
        longestWord = max(words, key = len) # key = len returns the word size
        print(longestWord) # prints the longest word

Issue

Exactly what ewong diagnosed :

last return statement is too deep indented

Currently:

  • the longest word in the first line only

Solution

Should be aligned with the loop's column, to be executed after the loop.

def word_finder(file_name):
    with open(file_name) as f:
        lines = f.readlines()
        line_num = 0
        longest_word = None
        for line in lines:
            line = line.strip()
            if len(line) == 0:
                return None
            else:
                line_num += 1
                tokens = line.split()
                for token in tokens:
                    if longest_word is None or len(token) > len(longest_word):
                        longest_word = token
            # return here would exit the loop too early after 1st line
        # loop ended
        return (str(line_num) + ": " + str(longest_word))

Then:

  • the longest word in the file with the line number attached to the front of it.

Improved

def word_finder(file_name):
    with open(file_name) as f:
        line_word_longest = None  # global max: tuple of (line-index, longest_word)
        for i, line in enumerate(f):  # line-index and line-content
            line = line.strip()
            if len(line) > 0:   # split words only if line present    
                max_token = max(token for token in line.split(), key = len)  # generator then max of tokens by length
                if line_word_longest is None or len(max_token) > len(line_word_longest[1]):
                    line_word_longest = (i, max_token)
        # loop ended
        if line_word_longest is None:
            return "No longest word found!"
        return f"{line_word_longest[0]}: '{line_word_longest[1]}' ({len(line_word_longest[1])} chars)"

See also:

Some SO research for similar questions:

Having a bit of fun with cutting this down:

def word_finder(file_name):
    with open("test.c") as f:
        lines = [{ 'num': i, 
                   'words': (ws := line.split()), 
                   'max': max(ws, key=len) if ws else '',  
                   'line': line } 
                 for i, line in enumerate(f.readlines())]
        m = max(lines, key=lambda l: len(l['max']))
        return f"{m['num']}: '{m['max']}'"

We use a list comprehension to turn each line into a dictionary describing its line number, all of the words that comprise it, the longest word and the original line. When computing the longest word we just insert an empty string if ws is empty, thus avoiding an exception for handing max an empty sequence.

It's then straightforward to use max to find the line with the longest word.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM