简体   繁体   English

使用 MapReduce 查找单词的最大 _length

[英]Finding the max _length of word using MapReduce

I need to find all the longest word/words from a txt file using MapReduce.我需要使用 MapReduce 从 txt 文件中找到所有最长的单词/单词。 I have written the following code for the mapper and reducer, but it shows the entire dictionary of len(words) as Key and the words as Values.我已经为映射器和化简器编写了以下代码,但它将整个 len(words) 字典显示为键,将单词显示为值。 I need help in writing the code to show the result of the max length only and the respective words.我需要帮助编写代码以仅显示最大长度和相应单词的结果。 Following is my code :以下是我的代码:

"""mapper.py"""
import sys
> for line in sys.stdin:
>   for word in line.strip().split():
>      print ('%s\t%s' % (len(word), word))



"""reducer.py"""

> import sys results={} for line in sys.stdin:
>     index, value = line.strip().split('\t')
>     if index not in results :
>         results[index] = value
>     else :
>         results[index] += ' '
>         results[index] += value

***** I m just stuck on this part to continue the coding to get the max(key) with corresponding words ***** 我只是停留在这部分继续编码以获得带有相应单词的 max(key)

Input file : How Peace Begins ?输入文件:和平如何开始? Peace begins with saying sorry, Peace begins with not hurting others, Peace begins with honesty ,trust and dedications, Peace begins with showing cooperation and respect.和平始于说对不起,和平始于不伤害他人,和平始于诚实、信任和奉献,和平始于合作和尊重。 World Peace Begins with Me !世界和平从我开始!

Output expected : The longest word has 11 characters.预期输出:最长的单词有 11 个字符。 The words are: dedications cooperation字是:奉献合作

I am not sure what you are doing with the stdin or why you are importing sys .我不确定你在用stdin做什么或者你为什么要导入sys Also, the sample input file doesn't seem to be in csv format but just a simple text file.此外,示例输入文件似乎不是 csv 格式,而只是一个简单的文本文件。 As I understand you problem, you want to read an input file, measure the length of each word and report out the length of the maximum word or words and list the words meeting this criteria.据我了解您的问题,您想读取输入文件,测量每个单词的长度并报告最大单词或单词的长度并列出符合此条件的单词。 With this in mind, this is how I would proceed:考虑到这一点,这就是我将继续的方式:

inputFile = r'sampleMapperText.txt'
with open(inputFile, 'r') as f:
    reslt = dict()  #keys = word lengths, values = words of key length
    text = f.read().split('\n')
    for line in text:
        words = line.split()
        for w in words:
            wdlist = reslt.pop(len(w), [])
            wdlist.append(w)
            reslt[len(w)] = wdlist
    maxLen = max(list(reslt.keys()))
    print(f"Max Word Length = {maxLen}, Longest words = {', '.join(reslt[maxLen])}")   

Running this code produces:运行此代码会产生:

Max Word Length = 12, Longest words = dedications,

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM