简体   繁体   中英

Spark standalone cluster read parquet files after saving

I've a two-node spark standalone cluster and I'm trying to read some parquet files that I just saved but am getting files not found exception.

Checking the location, it looks like all the parquet files got created on one of the nodes in my standalone cluster.

The problem now, reading the parquet files back, it says cannot find xasdad.part file.

The only way I manage to load it is to scale down the standalone spark cluster to one node.

My question is how can I load my parquet files while running more than one node in my standalone cluster ?

You have to put your files on a shard directory which is accessible to all spark nodes with the same path. Otherwise, use spark with Hadoop HDFS : a distributed file system.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM