简体   繁体   English

python通过mrjob找到最大值

[英]python find max value by mrjob

i would like to find the max value in list by mrjob. 我想通过mrjob在列表中找到最大值。 when i run this, it always show the error: 当我运行它时,它总是显示错误:

No configs found; 找不到配置; falling back on auto-configuration; 退回到自动配置; No configs specified for inline runner 没有为内联流道指定配置

i'd like to know what's the meaning 我想知道什么意思

class MRWordCounter(MRJob):

def mapper(self, key, line):
            num = csv_readline(line)
            yield num, 1
def reducer(self, word, compare):
            num_list = []
            for value in compare:
                    if value == max(compare):
                            value=num_list
                            yield word, num_list

You can use this method instead:- 您可以改用此方法:-

#The most occurred word
#Import Dependencies
from mrjob.job import MRJob
from mrjob.step import MRStep
import re

WORD_RE = re.compile(r"[\w']+")


class MRMostUsedWord(MRJob):

    def mapper_get_words(self, _, line):
        # yield each word in the line
        for word in WORD_RE.findall(line):
            yield (word.lower(), 1)

    def combiner_count_words(self, word, counts):
        # sum the words we've seen so far
        yield (word, sum(counts))

    def reducer_count_words(self, word, counts):
        # send all (num_occurrences, word) pairs to the same reducer.
        # num_occurrences is so we can easily use Python's max() function.
        yield None, (sum(counts), word)

    # discard the key; it is just None
    def reducer_find_max_word(self, _, word_count_pairs):
        # each item of word_count_pairs is (count, word),
        # so yielding one results in key=counts, value=word
        yield max(word_count_pairs)

    def steps(self):
        return [
            MRStep(mapper=self.mapper_get_words,
                   combiner=self.combiner_count_words,
                   reducer=self.reducer_count_words),
            MRStep(reducer=self.reducer_find_max_word)
        ]


if __name__ == '__main__':
    MRMostUsedWord.run()

What it simply does is:- 它所做的只是:

  • map the words. 映射单词。
  • combine the count for each word. 合并每个单词的计数。
  • flip the key,value pair. 翻转键值对。
  • reduce to find the max occurred word. 减少以查找最大出现的单词。
    To run the code, 要运行代码,

    save the text file and the python script in the same folder, and then: 将文本文件和python脚本保存在同一文件夹中,然后:

    python3 xyz.py xyz.txt python3 xyz.py xyz.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM