繁体   English   中英

在txt文件中查找最长的单词

[英]Finding longest word in a txt file

我正在尝试创建一个函数,其中将文件名作为参数,并且该函数返回文件中最长的单词,并在其前面附加行号。 这是我到目前为止所拥有的,但它没有产生我需要的预期输出。

def word_finder(file_name):
    with open(file_name) as f:
        lines = f.readlines()
        line_num = 0
        longest_word = None
        for line in lines:
            line = line.strip()
            if len(line) == 0:
                return None
            else:
                line_num += 1
                tokens = line.split()
                for token in tokens:
                    if longest_word is None or len(token) > len(longest_word):
                        longest_word = token
            return (str(line_num) + ": " + str(longest_word))

我认为这是找到单词的最短方法,如果不是,请正确

def wordFinder(filename):
    with open(filename, "r") as f:
        words = f.read().split() # split() returns a list with the words in the file
        longestWord = max(words, key = len) # key = len returns the word size
        print(longestWord) # prints the longest word

问题

究竟是什么ewong 诊断的

最后一个return语句缩进太深

目前:

  • 仅第一行中最长的单词

解决方案

应该与循环的列对齐,在循环之后执行。

def word_finder(file_name):
    with open(file_name) as f:
        lines = f.readlines()
        line_num = 0
        longest_word = None
        for line in lines:
            line = line.strip()
            if len(line) == 0:
                return None
            else:
                line_num += 1
                tokens = line.split()
                for token in tokens:
                    if longest_word is None or len(token) > len(longest_word):
                        longest_word = token
            # return here would exit the loop too early after 1st line
        # loop ended
        return (str(line_num) + ": " + str(longest_word))

然后:

  • 文件中最长的单词,其前面附有行号。

改进

def word_finder(file_name):
    with open(file_name) as f:
        line_word_longest = None  # global max: tuple of (line-index, longest_word)
        for i, line in enumerate(f):  # line-index and line-content
            line = line.strip()
            if len(line) > 0:   # split words only if line present    
                max_token = max(token for token in line.split(), key = len)  # generator then max of tokens by length
                if line_word_longest is None or len(max_token) > len(line_word_longest[1]):
                    line_word_longest = (i, max_token)
        # loop ended
        if line_word_longest is None:
            return "No longest word found!"
        return f"{line_word_longest[0]}: '{line_word_longest[1]}' ({len(line_word_longest[1])} chars)"

也可以看看:

针对类似问题的一些 SO 研究:

减少这个有点乐趣:

def word_finder(file_name):
    with open("test.c") as f:
        lines = [{ 'num': i, 
                   'words': (ws := line.split()), 
                   'max': max(ws, key=len) if ws else '',  
                   'line': line } 
                 for i, line in enumerate(f.readlines())]
        m = max(lines, key=lambda l: len(l['max']))
        return f"{m['num']}: '{m['max']}'"

我们使用列表推导将每一行变成一个字典,描述其行号、包含它的所有单词、最长单词和原始行。 在计算最长单词时,如果ws为空,我们只需插入一个空字符串,从而避免将空序列交给max的异常。

然后使用max找到最长单词的行就很简单了。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM