简体   繁体   English

sys.stdout.write()在MRJOB映射器中的位置是什么?

[英]Where does sys.stdout.write() go to in MRJOB mapper?

mrjob.conf mrjob.conf

runners:
  emr:
    aws_access_key_id: **
    aws_secret_access_key: **
    aws_region: us-east-1
    aws_availability_zone: us-east-1a
    ec2_key_pair: scrapers2
    ec2_key_pair_file: ~/arachnid.pem
    ec2_instance_type: c3.8xlarge
    ec2_master_instance_type: c3.8xlarge
    num_ec2_instances: 3
    python_bin: python2.6
    interpreter: python2.6
    ami_version: 2.4.11
    iam_job_flow_role: EMR_DefaultRole
    jobconf: {"mapred.task.timeout": 600000, "mapred.output.direct.NativeS3FileSystem": false}
    base_tmp_dir: /tmp
    enable_emr_debugging: true
    cmdenv:
        TZ: America/New_York
    s3_log_uri: s3://mrjob-lists/tmp/logs/
    s3_scratch_uri: s3://mrjob-lists/tmp/
    output_dir: s3://mrjob-lists/output
    ssh_tunnel_is_open: true
    ssh_tunnel_to_job_tracker: true

i am using emr to run the job and my mapper task has: 我正在使用emr来运行这个工作,我的mapper任务有:

print "test"

as well as 以及

sys.stdout.write("TEst")

However, I cannot find this output in the stdout files on S3. 但是,我在S3上的stdout文件中找不到此输出。 Where is the output written? 输出在哪里写的?

The mapper stdout for a Hadoop 1 job should appear in the S3 logs under /task-attempts/job_#####_##/attempt_#####_##_##/stdout.gz Hadoop 1作业的mapper stdout应出现在/task-attempts/job_#####_##/attempt_#####_##_##/stdout.gz下的/task-attempts/job_#####_##/attempt_#####_##_##/stdout.gz

It does take a little while for these to push to S3. 它们需要一段时间才能推向S3。 If you leave the cluster running you can check the Hadoop JobTracker web interface and make sure it appears locally in logs as well just after the job execution. 如果您使群集保持运行,您可以检查Hadoop JobTracker Web界面,并确保它在作业执行后立即显示在日志中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM