[英]How to run multiple mrjob tasks with different parameters
I have such a job: 我有这样的工作:
from mrjob.job import MRJob
from mrjob.step import MRStep
import urllib
import re
import httpagentparser
UA_STRING = re.compile(MYSUPERCOMPLEXREGEX)
class MRReferralAnalysis(MRJob):
def mapper(self, _, line):
for group in UA_STRING.findall(line):
ua = httpagentparser.simple_detect(group)
yield (ua, 1)
def reducer(self, itemOfInterest, counts):
yield (sum(counts), itemOfInterest)
def steps(self):
return [
MRStep( mapper=self.mapper,
reducer=self.reducer)
]
if __name__ == '__main__':
MRReferralAnalysis.run()
Now I want to call this mrjob program multiple times (about two dozen times), with varying parameters that are fetched from another file and passed into my MYSUPERCOMPLEXREGEX. 现在,我想多次调用此mrjob程序(大约两次),并使用从另一个文件获取并传递到我的MYSUPERCOMPLEXREGEX中的不同参数。 Is that even possible with mrJob and how to schedule the tasks?
使用mrJob甚至还有可能如何安排任务? Or write a wrapper program that triggers the jobs?
还是编写一个触发作业的包装程序?
Wrap the MRReferralAnalysis.run()
call in a loop, and read in your configuration immediately before the loop. 将
MRReferralAnalysis.run()
调用包装成一个循环,并在循环之前立即读入您的配置。 The job will then run multiple times. 然后该作业将运行多次。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.