[英]Where does sys.stdout.write() go to in MRJOB mapper?
mrjob.conf mrjob.conf
runners:
emr:
aws_access_key_id: **
aws_secret_access_key: **
aws_region: us-east-1
aws_availability_zone: us-east-1a
ec2_key_pair: scrapers2
ec2_key_pair_file: ~/arachnid.pem
ec2_instance_type: c3.8xlarge
ec2_master_instance_type: c3.8xlarge
num_ec2_instances: 3
python_bin: python2.6
interpreter: python2.6
ami_version: 2.4.11
iam_job_flow_role: EMR_DefaultRole
jobconf: {"mapred.task.timeout": 600000, "mapred.output.direct.NativeS3FileSystem": false}
base_tmp_dir: /tmp
enable_emr_debugging: true
cmdenv:
TZ: America/New_York
s3_log_uri: s3://mrjob-lists/tmp/logs/
s3_scratch_uri: s3://mrjob-lists/tmp/
output_dir: s3://mrjob-lists/output
ssh_tunnel_is_open: true
ssh_tunnel_to_job_tracker: true
i am using emr to run the job and my mapper task has: 我正在使用emr来运行这个工作,我的mapper任务有:
print "test"
as well as 以及
sys.stdout.write("TEst")
However, I cannot find this output in the stdout files on S3. 但是,我在S3上的stdout文件中找不到此输出。 Where is the output written?
输出在哪里写的?
The mapper stdout for a Hadoop 1 job should appear in the S3 logs under /task-attempts/job_#####_##/attempt_#####_##_##/stdout.gz
Hadoop 1作业的mapper stdout应出现在
/task-attempts/job_#####_##/attempt_#####_##_##/stdout.gz
下的/task-attempts/job_#####_##/attempt_#####_##_##/stdout.gz
It does take a little while for these to push to S3. 它们需要一段时间才能推向S3。 If you leave the cluster running you can check the Hadoop JobTracker web interface and make sure it appears locally in logs as well just after the job execution.
如果您使群集保持运行,您可以检查Hadoop JobTracker Web界面,并确保它在作业执行后立即显示在日志中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.