简体   繁体   English

如何使用不同的参数运行多个mrjob任务

[英]How to run multiple mrjob tasks with different parameters

I have such a job: 我有这样的工作:

from mrjob.job import MRJob
from mrjob.step import MRStep
import urllib
import re
import httpagentparser

UA_STRING = re.compile(MYSUPERCOMPLEXREGEX)

class MRReferralAnalysis(MRJob):

    def mapper(self, _, line):

        for group in UA_STRING.findall(line):

            ua = httpagentparser.simple_detect(group)
            yield (ua, 1)

    def reducer(self, itemOfInterest, counts):

        yield (sum(counts), itemOfInterest)

    def steps(self):
        return [
            MRStep( mapper=self.mapper,
                    reducer=self.reducer)
        ]

if __name__ == '__main__':
    MRReferralAnalysis.run()

Now I want to call this mrjob program multiple times (about two dozen times), with varying parameters that are fetched from another file and passed into my MYSUPERCOMPLEXREGEX. 现在,我想多次调用此mrjob程序(大约两次),并使用从另一个文件获取并传递到我的MYSUPERCOMPLEXREGEX中的不同参数。 Is that even possible with mrJob and how to schedule the tasks? 使用mrJob甚至还有可能如何安排任务? Or write a wrapper program that triggers the jobs? 还是编写一个触发作业的包装程序?

Wrap the MRReferralAnalysis.run() call in a loop, and read in your configuration immediately before the loop. MRReferralAnalysis.run()调用包装成一个循环,并在循环之前立即读入您的配置。 The job will then run multiple times. 然后该作业将运行多次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM