简体   繁体   English

谷歌云数据流作业创建错误:“无法设置工作池区域。请检查worker_region实验标志是否有效”

[英]Google cloud dataflow job creation error: "Cannot set worker pool zone. Please check whether the worker_region experiments flag is valid"

I try to create a dataflow job to index a bigquery table into elasticSearchwith the node package google-cloud/dataflow.v1beta3.我尝试使用节点 package google-cloud/dataflow.v1beta3 创建一个数据流作业以将 bigquery 表索引到 elasticSearch。

The job is working fine when it's created and launched from the google cloud console, but I have the following error when I try it in node: Error: 3 INVALID_ARGUMENT: (b69ddc3a5ef1c40b): Cannot set worker pool zone. Please check whether the worker_region experiments flag is valid. Causes: (b69ddc3a5ef1cd76): An internal service error occurred.从谷歌云控制台创建和启动该作业时工作正常,但是当我在节点中尝试时出现以下错误: Error: 3 INVALID_ARGUMENT: (b69ddc3a5ef1c40b): Cannot set worker pool zone. Please check whether the worker_region experiments flag is valid. Causes: (b69ddc3a5ef1cd76): An internal service error occurred. Error: 3 INVALID_ARGUMENT: (b69ddc3a5ef1c40b): Cannot set worker pool zone. Please check whether the worker_region experiments flag is valid. Causes: (b69ddc3a5ef1cd76): An internal service error occurred.

I tried to specify the experiments params in various ways but I always end up with the same error.我试图以各种方式指定实验参数,但我总是以同样的错误告终。

Does anyone managed to get a similar dataflow job working?有没有人设法让类似的数据流工作正常工作? Or do you have information about dataflow experiments?或者你有关于数据流实验的信息吗?

Here is the code:这是代码:

const { JobsV1Beta3Client } = require('@google-cloud/dataflow').v1beta3

const dataflowClient = new JobsV1Beta3Client()
const response = await dataflowClient.createJob({
  projectId: 'myGoogleCloudProjectId',
  location: 'europe-west1',
  job: {
    launch_parameter: {
      jobName: 'indexation-job',
      containerSpecGcsPath: 'gs://dataflow-templates-europe-west1/latest/flex/BigQuery_to_Elasticsearch',
      parameters: {
        inputTableSpec: 'bigQuery-table-gs-adress',
        connectionUrl: 'elastic-endpoint-url',
        index: 'elastic-index',
        elasticsearchUsername: 'username',
        elasticsearchPassword: 'password'
      }
    },
    environment: {
      experiments: ['worker_region']
    }
  }
})

Thank you very much for your help.非常感谢您的帮助。

After many attempts I manage yesterday to find how to specify the worker region.经过多次尝试,我昨天设法找到如何指定工作区域。 It looks like this:它看起来像这样:

await dataflowClient.createJob({
  projectId,
  location,
  job: {
    name: 'jobName',
    type: 'Batch',
    containerSpecGcsPath: 'gs://dataflow-templates-europe-west1/latest/flex/BigQuery_to_Elasticsearch',
    pipelineDescription: {
      inputTableSpec: 'bigquery-table',
      connectionUrl: 'elastic-url',
      index: 'elastic-index',
      elasticsearchUsername: 'username',
      elasticsearchPassword: 'password',
      project: projectId,
      appName: 'BigQueryToElasticsearch'
    },
    environment: {
      workerPools: [
        { region: 'europe-west1' }
      ]
    }
  }  
})

It's not working yet, I need to find the correct way to provide the other parameters, but now the dataflow job is created in the google cloud console.它还没有工作,我需要找到提供其他参数的正确方法,但现在数据流作业是在谷歌云控制台中创建的。

For anyone who would be struggling with this issue, I finally found how to launch a dataflow job from a template.对于任何在这个问题上苦苦挣扎的人,我终于找到了如何从模板启动数据流作业。

There is a function launchFlexTemplate that work the same way as the job creation in the google cloud console.有一个 function launchFlexTemplate的工作方式与谷歌云控制台中的作业创建相同。

Here is the final function working correctly:这是最终的 function 正常工作:

const { FlexTemplatesServiceClient } = require('@google-cloud/dataflow').v1beta3

const response = await dataflowClient.launchFlexTemplate({
  projectId: 'google-project-id',
  location: 'europe-west1',
  launchParameter: {
    jobName: 'job-name',
    containerSpecGcsPath: 'gs://dataflow-templates-europe-west1/latest/flex/BigQuery_to_Elasticsearch',
    parameters: {
      apiKey: 'elastic-api-key',  //mandatory but not used if you provide username and password
      connectionUrl: 'elasticsearch endpoint',
      index: 'elasticsearch index',
      elasticsearchUsername: 'username',
      elasticsearchPassword: 'password',
      inputTableSpec: 'bigquery source table',  //projectid:datasetId.table
      
      //parameters to upsert elasticsearch index
      propertyAsId: 'table index use for elastic _id',
      usePartialUpdate: true,
      bulkInsertMethod: 'INDEX'
    }
  }

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何更改 Google Cloud 中的区域/区域? - How to change Region / Zone in Google Cloud? 如何使用自定义 Docker 图像运行 Python Google Cloud Dataflow 作业? - How to run a Python Google Cloud Dataflow job with a custom Docker image? 如何在 Dataflow Worker 上安装私有存储库? - How to install private repository on Dataflow Worker? Golang 工作池实现意外工作 - Golang worker pool implementation working unexpectedly Google Dataflow 作业因“上传的数据不足”错误而失败 - Google Dataflow job failed with "insufficient data uploaded" error 如何在最大设置为 10 GB 的云作曲家 2 中增加工作人员存储空间 - How to increase the worker storage in cloud composer 2 which is max set to 10 GB 一旦使用 apache 光束 sdk 在 Google Cloud 中创建数据流作业,我们可以从云存储桶中删除 tmp 文件吗? - Once dataflow job is created in Google Cloud using apache beam sdk, can we delete the tmp files from cloud storage bucket? 使用 Google Cloud Dataflow flex 模板时,是否可以使用多命令 CLI 来运行作业? - When using Google Cloud Dataflow flex templates, is it possible to use a multi-command CLI to run a job? Cloud Dataflow 中的失败作业:启用 Dataflow API - Failed job in Cloud Dataflow: enable Dataflow API 如何解决服务人员导航“此服务人员不是客户端的活动服务人员”错误? - How to solve service worker navigate "This service worker is not the client's active service worker" error?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM