繁体 English 中英

Spark从每个节点读取文件，类似于Hadoop的DistribuitedCache

[英]Spark read file from each node similar to Hadoop's DistribuitedCache

原文 2017-04-12 08:15:52 4 2 file/ apache-spark/ slave

我在主节点中有一个文件，每个节点都应读取。 我怎样才能做到这一点？ 在Hadoop的MapReduce中，我使用了

DistribuitedCache.getLocalCacheFiles(context.getConfiguration())

Spark如何在节点之间共享文件？ 我是否必须将文件加载到RAM和广播变量中？ 或者我只能在SparkContext配置中指示文件路径（绝对？），并且该路径立即可用于所有节点？

2 个解决方案

您可以使用SparkFiles从分布式缓存中读取文件。

import org.apache.spark.SparkFiles
import org.apache.hadoop.fs.Path

sc.addFile("/path/to/file.txt")
val pathOnWorkerNode = new Path(SparkFiles.get("file.txt"))

例如，查看spark-submit“ files”参数：

在具有其他文件的YARN群集上运行Spark作业

如何从文件读取长字符串到数组或类似的东西？

[英]how to read a long string from a file into an array or something similar?

从NetLogo的每个刻度中读取特定文件

[英]read from a specific file in each tick in NetLogo

Python：你可以使用glob（或类似的）在每次迭代中扩展一个数组来读取目录中的文件

[英]Python: Can you extend an array on each iteration using glob (or similar) to read in files from a directory

从Java文件中读取每个字符串文本

[英]Read the each string text from file in java

如何在Node中读取文件？

[英]How to read file in Node?

读取输入文件到链表的节点错误

[英]read input file to linked list's node error

以伪分布式模式在hadoop中读取和写入文件

[英]Read and Write a file in hadoop in pseudo distributed mode

从文件中读取每一行，然后在C中将该行拆分为一个字符串和一个数组

[英]Read each line from a file and split the line into a string and an array in C

如何分别从文件中读取一行的每个单词？

[英]how can i read each word of a line from a file separately?

从文本文件中读取并排序行（每行 int + String）

[英]Read from a text file and order the rows (each row int + String)

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从文件读取长字符串到数组或类似的东西？从NetLogo的每个刻度中读取特定文件 Python：你可以使用glob（或类似的）在每次迭代中扩展一个数组来读取目录中的文件从Java文件中读取每个字符串文本如何在Node中读取文件？读取输入文件到链表的节点错误以伪分布式模式在hadoop中读取和写入文件从文件中读取每一行，然后在C中将该行拆分为一个字符串和一个数组如何分别从文件中读取一行的每个单词？从文本文件中读取并排序行（每行 int + String）

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM