[英]Encounter assertionError: Job did not reach to a terminal state after waiting indefinitely with Beam/Dataflow
我正在尝试使用 apache Beam 和 dataflow 来加速一些数据处理,但它遇到了:
'Job did not reach to a terminal state after waiting indefinitely.') AssertionError: Job did not reach to a terminal state after waiting indefinitely.
我已经简化了我的测试管道,但仍然出现错误(尽管我可以在本地成功运行 DirectRunner ,所以我认为这应该是一些幼稚的设置问题或光束/数据流中的错误?另外,我查了一下,还有另一个问题会给出类似的错误,这是由从谷歌存储中读取大量数据引起的,这可能已经修复。我认为我的情况与此无关,因为我的最小代码没有通过测试。下面是我的最小代码(长保留 argparse 代码,因为我怀疑错误可能与它们有关?):
import os
import argparse
import apache_beam as beam
import logging
def run(argv=None, save_main_session=True) -> None:
parser = argparse.ArgumentParser()
parser.add_argument('--given_landmarks', default=False, type=bool,
help="Whether to use pre-selected landmark objects")
parser.add_argument('--hmm_type', default='path_specific', type=str, choices=['path_specific', 'hard_em', 'random'],
help='The HMM type. Currently Path-specific, Hard EM, and Random are available.')
parser.add_argument('--magnitude_normalization', default='normal', type=str,choices=['gamma', 'normal'],
help="Distribution type for calculating probability of magnitude for Observer.")
parser.add_argument('--instruction_type', default='full', type=str,
choices=['full', 'object_only', 'direction_only',
'mask_object', 'mask_direction'],
help='Toggle for full/object-only/direction-only instructions.')
parser.add_argument('--num_instructions', default=1, type=int,
help="The number of instructions to generate per path")
parser.add_argument('--mp3d_dir', default='/path/to/matterport_data/', type=str,
help='Path to Room-to-Room scan data.')
parser.add_argument('--path_input_dir', default=None, type=str,
help='Path to Room-to-Room JSON data.')
parser.add_argument('--dataset', default=None, type=str, choices=[
'R2R', 'R4R', 'RxR'], help='Data source.')
parser.add_argument(
'--file_identifier', default='val_seen', type=str,
help='Source JSON file identifier for Crafty instruction creation.')
parser.add_argument('--output_file', default=None, type=str,
help='Output file to save generated instructions.')
parser.add_argument(
'--appraiser_file', type=str,
default='./crafty.object_idfs.r2r_train.txt',
help='File to read appraiser information from.')
parser.add_argument(
'--full_train_file_path', default=None, type=str,
help='Path to full training file, for EM training covering all partitions.')
args, pipeline_args = parser.parse_known_args()
print(args)
if not os.path.exists(args.output_file):
os.makedirs(args.output_file)
def pipeline(root):
logging.info('Starting Beam pipeline.')
outputs = (
root
| 'create_input_1' >> beam.Create([1,2,3,4,5])
| 'map' >> beam.Map(lambda x: (x, 1))
)
outputs | beam.Map(print)
pipeline_options = beam.options.pipeline_options.PipelineOptions(pipeline_args)
# pipeline_options = beam.options.pipeline_options.PipelineOptions()
# pipeline_options.view_as(beam.options.pipeline_options.SetupOptions).save_main_session = save_main_session
# pipeline_options.view_as(beam.options.pipeline_options.DirectOptions
# ).direct_num_workers = os.cpu_count()
#pipeline_options.view_as(beam.options.pipeline_options.DirectOptions).direct_running_mode = "multi-processing"
with beam.Pipeline(options=pipeline_options) as root:
pipeline(root)
if __name__ == '__main__':
run()
我的命令从这里开始:
python test.py \
--path_input_dir gs://somepath \
--dataset somename \
--mp3d_dir gs://somepath \
--file_identifier someid \
--output_file gs://some/other/path \
--num_instructions 1 \
--region us-east1 \
--runner DataflowRunner \
--project someproject-id \
--temp_location gs://someloc
感谢您的任何意见或建议!
不是一个完美的答案,但此错误消息表明即使作业未完成,即使您没有指定等待的最长时间,观察和等待您的作业完成的线程也已终止。 它可能因各种原因而死亡。
错误出现在Beam 代码库中,供参考。
你检查过日志吗? 可能是权限问题。 我收到了同样的错误,在作业日志中我收到了这条消息:
工作流失败。 原因:控制器服务帐户的权限验证失败。 IAM 角色角色/dataflow.worker 中的所有权限应授予控制器服务帐户 XXXXXXXXXXXXX-compute@developer.gserviceaccount.com。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.