[英]Kafka connect and HDFS in docker
我在docker-compose中使用kafka連接HDFS接收器和Hadoop(用於HDFS)。
Hadoop(名稱節點和數據節點)似乎正常工作。
但是我對kafka connect sink有一個錯誤:
ERROR Recovery failed at state RECOVERY_PARTITION_PAUSED
(io.confluent.connect.hdfs.TopicPartitionWriter:277)
org.apache.kafka.connect.errors.DataException:
Error creating writer for log file hdfs://namenode:8020/logs/MyTopic/0/log
有關信息:
我的docker-compose.yml中的Hadoop服務:
namenode: image: uhopper/hadoop-namenode:2.8.1 hostname: namenode container_name: namenode ports: - "50070:50070" networks: default: fides-webapp: aliases: - "hadoop" volumes: - namenode:/hadoop/dfs/name env_file: - ./hadoop.env environment: - CLUSTER_NAME=hadoop-cluster datanode1: image: uhopper/hadoop-datanode:2.8.1 hostname: datanode1 container_name: datanode1 networks: default: fides-webapp: aliases: - "hadoop" volumes: - datanode1:/hadoop/dfs/data env_file: - ./hadoop.env
和我的kafka-connect文件:
name=hdfs-sink
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=MyTopic
hdfs.url=hdfs://namenode:8020
flush.size=3
編輯:
我為kafka connect添加了一個env變量以了解集群名稱(env變量:CLUSTER_NAME,用於在docker compose文件中的kafka connect服務中添加)。
錯誤並不相同(似乎可以解決問題):
INFO Starting commit and rotation for topic partition scoring-topic-0 with start offsets {partition=0=0} and end offsets {partition=0=2}
(io.confluent.connect.hdfs.TopicPartitionWriter:368)
ERROR Exception on topic partition MyTopic-0: (io.confluent.connect.hdfs.TopicPartitionWriter:403)
org.apache.kafka.connect.errors.DataException: org.apache.hadoop.ipc.RemoteException(java.io.IOException):
File /topics/+tmp/MyTopic/partition=0/bc4cf075-ccfa-4338-9672-5462cc6c3404_tmp.avro
could only be replicated to 0 nodes instead of minReplication (=1).
There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
編輯2:
hadoop.env
文件是:
CORE_CONF_fs_defaultFS=hdfs://namenode:8020
# Configure default BlockSize and Replication for local
# data. Keep it small for experimentation.
HDFS_CONF_dfs_blocksize=1m
YARN_CONF_yarn_log___aggregation___enable=true
YARN_CONF_yarn_resourcemanager_recovery_enabled=true
YARN_CONF_yarn_resourcemanager_store_class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
YARN_CONF_yarn_resourcemanager_fs_state___store_uri=/rmstate
YARN_CONF_yarn_nodemanager_remote___app___log___dir=/app-logs
YARN_CONF_yarn_log_server_url=http://historyserver:8188/applicationhistory/logs/
YARN_CONF_yarn_timeline___service_enabled=true
YARN_CONF_yarn_timeline___service_generic___application___history_enabled=true
YARN_CONF_yarn_resourcemanager_system___metrics___publisher_enabled=true
YARN_CONF_yarn_resourcemanager_hostname=resourcemanager
YARN_CONF_yarn_timeline___service_hostname=historyserver
最后,就像@ cricket_007注意到的那樣,我需要配置hadoop.conf.dir
。
該目錄應包含hdfs-site.xml
。
在對每個服務進行docker化后,我需要創建一個命名卷,以便在kafka-connect
服務和namenode
服務之間共享配置文件。
為此,我添加了docker-compose.yml
:
volumes:
hadoopconf:
然后為namenode
服務添加:
volumes:
- hadoopconf:/etc/hadoop
對於kafka connect服務:
volumes:
- hadoopconf:/usr/local/hadoop-conf
最后,我將HDFS接收器屬性文件中的hadoop.conf.dir
設置為/usr/local/hadoop-conf
。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.