简体   繁体   English

MPI读取和写入HDFS

[英]MPI read and write to HDFS

Does anyone know a good way to read/write files to my hdfs from within MPI? 有谁知道从MPI读取文件到我的hdfs的好方法? I've done a fair amount of digging trying to figure this out, and just need a general direction to pursue. 我已经做了很多挖掘工作,试图弄清楚这一点,只需要一个大致的指导即可。

There is a full chapter of the MPI Standard about MPI I/O. MPI标准中有一整章关于MPI I / O。 I'd start by reading there. 我将从那里开始阅读。

MPI implementations have this implemented, usually using ROMIO. MPI实现通常使用ROMIO来实现。 You can also take a look at that. 您也可以看一下。

There are some oddities with HDFS that make it an interesting target for MPI-IO. HDFS有一些奇怪之处,使其成为MPI-IO的有趣目标。 Foremost, the restriction on modifications (writes) from more than one process. 首先,来自多个过程的修改(写)限制。

It looks like the PLFS project (which takes MPI-IO style "all write to one file" workloads and changes them to "one file per process" workloads) has made HDFS one of its targets. 看起来PLFS项目(采用MPI-IO风格的“全部写入一个文件”工作负载并将其更改为“每个进程一个文件”工作负载)使HDFS成为其目标之一。 This paper (with a whopping two citations) appears to be the reference? 这篇论文(被引用多达两篇)似乎是参考? http://www.pdl.cmu.edu/PDL-FTP/HECStorage/CMU-PDL-12-115.pdf http://www.pdl.cmu.edu/PDL-FTP/HECStorage/CMU-PDL-12-115.pdf

So you'd have the MPI-IO interface, implemented by ROMIO. 因此,您将拥有由ROMIO实现的MPI-IO接口。 ROMIO has a device abstraction layer called ADIO, and PLFS can be one of those underlying devices (if you patch it). ROMIO有一个称为ADIO的设备抽象层,PLFS可以是那些基础设备之一(如果您对其进行了修补)。 Then PLFS speaks HDFS and you finally perform I/O. 然后PLFS讲HDFS,您最终执行I / O。

I have no idea how performant this stack is! 我不知道这个堆栈的性能如何!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM