繁体   English   中英

通过数据流将数据写入云 bigTable 时出错

[英]getting error while writing data onto cloud bigTable through dataflow

我正在使用第二代云 function 来触发数据流作业。 Dataflow 模板基本上是从云存储中读取 parquet 文件并将数据加载到 bigTable 中。 这是代码和 package 详细信息

import os
import datetime
import logging
from configparser import ConfigParser
import apache_beam as beam
from google.cloud.bigtable import Client
from google.cloud.bigtable.row import DirectRow
from apache_beam.options.pipeline_options import PipelineOptions
from google.cloud import bigtable
from google.cloud.bigtable import column_family
from google.cloud.bigtable import row_filters

from apache_beam.io.gcp.bigtableio import WriteToBigTable

logger = logging.getLogger()
logger.setLevel(logging.INFO)

config_object = ConfigParser()
config_object.read("config.ini")

project_id = config_object["uprn"]["project_id"]
instance_id = config_object["uprn"]["instance_id"]
table_id = config_object["uprn"]["table_id"]
column_family_id = config_object["uprn"]["column_family_id"]
#input_columns = config_object["uprn"]["input_columns"]
timestamp = datetime.datetime(1970, 1, 1)
logging.info("--Starting..")

#client = bigtable.Client(project=project_id, admin=True)
#instance = client.instance(instance_id)
#table = instance.table(table_id)

def big_table_load(ele):
    try:
        rows = []
        column_names = list(ele.keys())
        row_key = str(str(ele['uprn'])).encode()
        logging.info("--row_key "+str(row_key))
        row = DirectRow(row_key)

        for key in column_names:
            row.set_cell(
                column_family_id, key, str(ele[key]).encode('utf-8'), timestamp=timestamp
            )
        rows.append(row)
        return rows
    except Exception as e:
        logging.info("Error encountered for row_key " + str(row_key) + " with error message "+ str(e))

def find_err_file():
    filename_err = user_options.efilename.get()
    return filename_err


class UserOptions(PipelineOptions):
    @classmethod
    def _add_argparse_args(cls, parser):
        parser.add_value_provider_argument('--input_location',
                                           default='gs://my-proj-dev-local-landing-zone/mock_data/*'
                                           )


pipeline_options = PipelineOptions()
user_options = pipeline_options.view_as(UserOptions)


def run():
    try:
        with beam.Pipeline(options=pipeline_options) as p:
            records = (p | 'Read' >> beam.io.ReadFromParquet(user_options.input_location)
                       | 'Format Rows' >> beam.ParDo(big_table_load)
                       | WriteToBigTable(
                        project_id=project_id,
                        instance_id=instance_id,
                        table_id=table_id
                    )
                       )
    except Exception as e:
        logging.info(e)
        raise e


if __name__ == '__main__':
    run()

需求.txt

google-cloud-bigtable==1.7.0
apache-beam[gcp]==2.39.0

错误处理指令 process_bundle-4225915941562411087-3。 原始回溯是 Traceback(最近调用最后一次):文件“apache_beam/runners/common.py”,第 1232 行,在 apache_beam.runners.common.DoFnRunner._invoke_bundle_method 文件“apache_beam/runners/common.py”,第 475 行,在apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle 文件“apache_beam/runners/common.py”,第 481 行,在 apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle 文件“/usr/local/lib/python3.7/site-packages/ apache_beam/io/gcp/bigtableio.py”,第 187 行,在 finish_bundle self.batcher.flush() 文件“/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigtableio.py” , line 88, in flush status.code))) Exception: Failed to write a batch of 12 records due to 'not_found' 在处理上述异常时,发生另一个异常:Traceback (most recent call last): File "/usr /local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py”,第 267 行,在 _execute response = task() 文件“/usr/local/lib/python3.7/site-packages /apache_beam/跑步者/工作 er/sdk_worker.py", line 340, in lambda: self.create_worker().do_instruction(request), request) File "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker .py”,第 581 行,在 do_instruction getattr(request, request_type), request.instruction_id) 文件“/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/sdk_worker.py”中,第 618 行,在 process_bundle bundle_processor.process_bundle(instruction_id)) 文件“/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/bundle_processor.py”,第 1001 行,在 process_bundle op.finish() 文件“ apache_beam/runners/worker/operations.py”,第 736 行,在 apache_beam.runners.worker.operations.DoOperation.finish 文件“apache_beam/runners/worker/operations.py”,第 738 行,在 apache_beam.runners.worker.operations .DoOperation.finish 文件“apache_beam/runners/worker/operations.py”,第 739 行,在 apache_beam.runners.worker.operations.DoOperation.finish 文件“apache_beam/runners/common.py”,第 1253 行,在 apache_beam.runn ers.common.DoFnRunner.finish 文件“apache_beam/runners/common.py”,第 1234 行,在 apache_beam.runners.common.DoFnRunner._invoke_bundle_method 文件“apache_beam/runners/common.py”,第 1281 行,在 apache_beam.runners 中。 common.DoFnRunner._reraise_augmented 文件“apache_beam/runners/common.py”,第 1232 行,在 apache_beam.runners.common.DoFnRunner._invoke_bundle_method 文件“apache_beam/runners/common.py”,第 475 行,在 apache_beam.runners.common 中。 DoFnInvoker.invoke_finish_bundle 文件“apache_beam/runners/common.py”,第 481 行,在 apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle 文件“/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/ bigtableio.py”,第 187 行,在 finish_bundle self.batcher.flush() 文件“/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigtableio.py”,第 88 行,在刷新status.code)))异常:由于“not_found”而无法写入一批 12 条记录 [在运行“WriteToBigTable/ParDo(_BigTableWriteFn)-ptransform-43”时]

出现“未找到”错误 - 您正在编写的表和列族是否存在?

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM