簡體   English   中英

Apache Pulsar 的 Delta Lake Sink 連接器,帶有 miniO 拋出 (java.lang.IllegalArgumentException)

[英]Delta Lake Sink Connector for Apache Pulsar with miniO throws (java.lang.IllegalArgumentException)

我正在嘗試運行新的 apache pulsar Lakehouse Sink Connector ,我得到java.lang.IllegalArgumentException

下面是我的設置。 docker-compose.yaml 文件:

version: '3.7'
volumes:
  mssql-data:
  minio-data:
networks:
  oentity:
    driver: bridge
services:
  pulsar:
    image: apachepulsar/pulsar:latest
    command: bin/pulsar standalone
    hostname: pulsar
    ports:
      - "8080:8080"
      - "6650:6650"
    restart: unless-stopped
    networks:
      oentity:
    volumes:
      - "./data/:/pulsar/data"
      - "./connectors/:/pulsar/connectors"
  dashboard:
    image: apachepulsar/pulsar-manager:latest
    ports:
      - "9528:9527"
      - "7750:7750"
    networks:
      oentity:
    depends_on:
      - pulsar
    links:
      - pulsar
    environment:
      SPRING_CONFIGURATION_FILE: /pulsar-manager/pulsar-manager/application.properties
  minio:
    image: 'minio/minio:latest'
    hostname: minio
    container_name: minio
    ports:
      - '9000:9000'
      - '9001:9001'
    volumes:
      - minio-data:/data
    environment:
      MINIO_ROOT_USER: minio
      MINIO_ROOT_PASSWORD: minio123
      MINIO_ACCESS_KEY: minio
      MINIO_SECRET_KEY: minio123
    command: server --console-address ":9001" /data
    networks:
      oentity:
  • 我在這里下載了連接器,並將 NAR package 復制到容器中的 Pulsar 連接器目錄$PULSAR_HOME/connectors中。

  • 我從 http://localhost:9001/login 登錄到 miniO 並創建了一個桶調用 Lakehouse。

  • 我使用了與此處描述的配置類似的配置,並將tablePath值替換為我的 miniO 路徑。 我將文件命名為sink-connector-config.json

{
  "tenant":"public",
  "namespace":"default",
  "name":"delta_sink",
  "parallelism":1,
  "inputs": [
    "test-delta-pulsar"
  ],
  "archive": "connectors/pulsar-io-lakehouse-2.9.3.7-cloud.nar",
  "processingGuarantees":"EFFECTIVELY_ONCE",
  "configs":{
      "type":"delta",
      "maxCommitInterval":120,
      "maxRecordsPerCommit":10000000,
      "tablePath": "s3a://lakehouse/delta_sink",
      "hadoop.fs.s3a.aws.credentials.provider": "com.amazonaws.auth.DefaultAWSCredentialsProviderChain"
  }
}
  • 我從容器中運行了 Lakehouse 水槽連接器。 docker exec -it <container name> bash

然后我執行了

PULSAR_HOME/bin/pulsar-admin sink localrun \
--sink-config-file sink-connector-config.json

我得到了下面的錯誤;

2022-09-06T16:53:08,396+0000 [main] INFO  org.apache.pulsar.functions.utils.io.ConnectorUtils - Found connector ConnectorDefinition(name=lakehouse, description=Lakehouse connectors, sourceClass=org.apache.pulsar.ecosystem.io.lakehouse.SourceConnector, sinkClass=org.apache.pulsar.ecosystem.io.lakehouse.SinkConnector, sourceConfigClass=org.apache.pulsar.ecosystem.io.lakehouse.SourceConnectorConfig, sinkConfigClass=org.apache.pulsar.ecosystem.io.lakehouse.SinkConnectorConfig) from /pulsar/connectors/pulsar-io-lakehouse-2.9.3.7-cloud.nar
2022-09-06T16:53:44,562+0000 [main] ERROR org.apache.pulsar.functions.LocalRunner - Encountered error starting localrunner
java.lang.IllegalArgumentException: Could not validate sink config: Cannot construct instance of `org.apache.pulsar.ecosystem.io.lakehouse.SinkConnectorConfig` (no Creators, like default constructor, exist): abstract types either need to be mapped 
to concrete types, have custom deserializer, or contain additional type information
 at [Source: UNKNOWN; byte offset: #UNKNOWN]
        at org.apache.pulsar.functions.utils.SinkConfigUtils.validateSinkConfig(SinkConfigUtils.java:594) ~[org.apache.pulsar-pulsar-functions-utils-2.9.3.jar:2.9.3]
        at org.apache.pulsar.functions.utils.SinkConfigUtils.validateAndExtractDetails(SinkConfigUtils.java:441) ~[org.apache.pulsar-pulsar-functions-utils-2.9.3.jar:2.9.3]
        at org.apache.pulsar.functions.LocalRunner.start(LocalRunner.java:439) ~[org.apache.pulsar-pulsar-functions-local-runner-original-2.9.3.jar:2.9.3]
        at org.apache.pulsar.functions.LocalRunner.main(LocalRunner.java:198) [org.apache.pulsar-pulsar-functions-local-runner-original-2.9.3.jar:2.9.3]
root@pulsar:/pulsar#

由於您將其作為 bild-in 連接器運行,您是否確認該連接器可用,例如$ pulsar-admin sinks available-sinks

您是否嘗試過將其作為非內置連接器運行,例如

PULSAR_HOME/bin/pulsar-admin sinks create \
--sink-config-file <lakehouse-sink-config.yaml>

看看我的例子

https://github.com/tspannhw/FLiP-Pi-DeltaLake-Thermal

您可能需要文件類型

配置:類型:“delta” maxCommitInterval:120 maxRecordsPerCommit:10_000_000 tablePath:“file:///opt/demo/lakehouse” processingGuarantees:“EXACTLY_ONCE” deltaFileType:“parquet”訂閱類型:“故障轉移”

也不要做本地運行,做一個創建

bin/pulsar-admin sinks create --archive./connectors/pulsar-io-lakehouse-2.9.2.24.nar --tenant public --namespace default --name delta_sink --sink-config-file conf/deltalakesink.yml - -inputs "persistent://public/default/pi-sensors" --parallelism 1 --subs-name pisensorwatch --processing-guarantees EFFECTIVELY_ONCE

也嘗試 YML 配置而不是 JSON

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM