简体   繁体   中英

Encounter assertionError: Job did not reach to a terminal state after waiting indefinitely with Beam/Dataflow

I am trying to use apache beam and dataflow to speed up some data processing, but it encounters:

'Job did not reach to a terminal state after waiting indefinitely.') AssertionError: Job did not reach to a terminal state after waiting indefinitely.

I have simplified my pipeline for testing, but still getting the error( although I could run successfully locally use DirectRunner , so I figure it should be some naive setup issue or a bug in beam/dataflow? Also, I looked up and there is another issue would give similar error that is caused by reading large amount of data from google storage, which is likely already fixed. I don't think my case relates to that as my minimal code does not past test. Below are my minimal code (long argparse code are kept as I suspect error might relate to them?):

import os
import argparse
import apache_beam as beam
import logging

def run(argv=None, save_main_session=True) -> None:
    parser = argparse.ArgumentParser()
    parser.add_argument('--given_landmarks', default=False, type=bool,
                        help="Whether to use pre-selected landmark objects")
    parser.add_argument('--hmm_type', default='path_specific', type=str, choices=['path_specific', 'hard_em',  'random'],
                        help='The HMM type. Currently Path-specific, Hard EM, and Random are available.')
    parser.add_argument('--magnitude_normalization', default='normal', type=str,choices=['gamma', 'normal'],
                        help="Distribution type for calculating probability of magnitude for Observer.")
    parser.add_argument('--instruction_type', default='full', type=str,
                        choices=['full', 'object_only', 'direction_only',
                                'mask_object', 'mask_direction'],
                        help='Toggle for full/object-only/direction-only instructions.')
    parser.add_argument('--num_instructions', default=1, type=int,
                        help="The number of instructions to generate per path")
    parser.add_argument('--mp3d_dir', default='/path/to/matterport_data/', type=str,
                        help='Path to Room-to-Room scan data.')
    parser.add_argument('--path_input_dir', default=None, type=str,
                        help='Path to Room-to-Room JSON data.')

    parser.add_argument('--dataset', default=None, type=str, choices=[
                        'R2R', 'R4R', 'RxR'], help='Data source.')

    parser.add_argument(
        '--file_identifier', default='val_seen', type=str,
        help='Source JSON file identifier for Crafty instruction creation.')

    parser.add_argument('--output_file', default=None, type=str,
                    help='Output file to save generated instructions.')

    parser.add_argument(
        '--appraiser_file', type=str,
        default='./crafty.object_idfs.r2r_train.txt',
        help='File to read appraiser information from.')

    parser.add_argument(
        '--full_train_file_path', default=None, type=str,
        help='Path to full training file, for EM training covering all partitions.')

    args, pipeline_args = parser.parse_known_args()
    print(args)
    if not os.path.exists(args.output_file):
        os.makedirs(args.output_file)

    def pipeline(root):

        logging.info('Starting Beam pipeline.')
        outputs = (
        root
        | 'create_input_1' >> beam.Create([1,2,3,4,5])
        | 'map' >> beam.Map(lambda x: (x, 1))
        )
        outputs | beam.Map(print)

    pipeline_options = beam.options.pipeline_options.PipelineOptions(pipeline_args)
    # pipeline_options = beam.options.pipeline_options.PipelineOptions()
    # pipeline_options.view_as(beam.options.pipeline_options.SetupOptions).save_main_session = save_main_session
    # pipeline_options.view_as(beam.options.pipeline_options.DirectOptions
      # ).direct_num_workers = os.cpu_count()
    #pipeline_options.view_as(beam.options.pipeline_options.DirectOptions).direct_running_mode = "multi-processing"

    with beam.Pipeline(options=pipeline_options) as root:
        pipeline(root)


if __name__ == '__main__':
    run()

And my command follows from here :

 python test.py  \                                                                                                                                                                                                                                                               
    --path_input_dir gs://somepath \
    --dataset somename  \
    --mp3d_dir gs://somepath  \
    --file_identifier someid  \
    --output_file gs://some/other/path  \
    --num_instructions 1 \
    --region us-east1 \
    --runner DataflowRunner \
    --project someproject-id \
    --temp_location gs://someloc

Thanks for any comments or suggestions!

Not a perfect answer, but this error message indicates that the thread watching and waiting for your job to finish was terminated even though the job was not completed, even though you did not specify a maximum time to wait. It could have died for a variety of reasons.

The error occurs here in the Beam codebase, for reference.

Did you check the logs? It may be a permissions issue. I received the same error, and in the job logs I had this message:

Workflow failed. Causes: Permissions verification for controller service account failed. All permissions in IAM role roles/dataflow.worker should be granted to controller service account XXXXXXXXXXXXX-compute@developer.gserviceaccount.com.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM