I tried different docker images for Hadoop containers but none of them work when I try to write files to HDFS. I always get error:
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /t/_temporary/0/_temporary/attempt_20200528153700_0001_m_000006_7/part-00006-34c8bc6d-68a3-4177-bfbf-5f225b28c157-c000.snappy.parquet could only be replicated to 0 nodes instead of minReplication (=1). There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
What I tried so far?
I'm running app on my local computer, Docker containers running on local as well.
After creating basic Dataframe, I'm trying to write.
df.write.save('hdfs://hadoop-master:9000/t', format='parquet', mode='append'
It took almost 2 minutes, then throws error.
WebUI is fine. I can put file to HDFS with commands in container.
It seems like network/connection problem to me, but couldn't find out.
I didn't solve problem but found a quick solution.
TL;TR
MacOS may cause this problem.
Built new Debian server on GCP, installed docker, its images and Python codes which I tested. It worked fine, but still I am getting error when I try to connect from my local machine.
But still need an answer, I share it for someone who needs quick solution.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.