使用并行I / O复制大数据文件

Question

I have a fairly big data set, about 141M lines with .csv formatted. 我有一个相当大的数据集，约有1.41亿行带有.csv格式的行。 I want to use MPI commands with C++ to copy and manipulate a few columns, but I'm a newbie on both C++ and MPI. 我想将MPI命令与C ++一起使用来复制和操作一些列，但是我在C ++和MPI上都是新手。

So far my code looks like this 到目前为止，我的代码看起来像这样

#include <stdio.h>
#include "mpi.h"

using namespace std;

int main(int argc, char **argv)
{
    int i, rank, nprocs, size, offset, nints, bufsize, N=4;
    MPI_File fp, fpwrite; // File pointer
    MPI_Status status;
    MPI_Offset filesize;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
    MPI_File_get_size(fp, &filesize);

    int buf[N];
    for (i = 0; i<N; i++)
        buf[i] = i;
    offset = rank * (N/size)*sizeof(int);
    MPI_File_open(MPI_COMM_WORLD, "new.csv", MPI_MODE_RDONLY, MPI_INFO_NULL, &fp);

    MPI_File_open(MPI_COMM_WORLD, "Ntest.csv", MPI_MODE_CREATE|MPI_MODE_WRONLY, MPI_INFO_NULL, &fpwrite);

    MPI_File_read(fp, buf, N, MPI_INT, &status);

    // printf("\nrank: %d, buf[%d]: %d\n", rank, rank*bufsize, buf[0]);
    printf("My rank is: %d\n", rank);
    MPI_File_write_at(fpwrite, offset, buf, (N/size), MPI_INT, &status);

    /* // repeat the process again
    MPI_Barrier(MPI_COMM_WORLD);
    printf("2/ My rank is: %d\n", rank); */

    MPI_File_close(&fp);
    MPI_File_close(&fpwrite);
    MPI_Finalize();
}

I'm not sure where to start, and I've seen a few examples with lustre stripes. 我不确定从哪里开始，我已经看到了一些带有光泽条纹的示例。 I would like to go that direction if possible. 如果可能的话，我想朝那个方向前进。 Additional options include HDF5 and T3PIO. 其他选项包括HDF5和T3PIO。

Answer 1

You are way too early to worry about lustre stripes, aside from the fact that lustre stripes are by default something ridiculously small for a "parallel file system". 除了默认情况下，对于“并行文件系统”来说，光泽条带有些小之外，您还太早担心光泽条带了。 Increase the stripe size of the directory where you will write and read these files with lfs setstripe 使用lfs setstripe增加将在其中写入和读取这些文件的目录的条带大小

Your first challenge will be how to decompose this CSV file. 您的第一个挑战是如何分解此CSV文件。 What does a typical row look like? 典型的行是什么样的？ If the rows are of variable length, you're going to have a bit of a headache. 如果行的长度可变，那么您将有些头疼。 Here's why: 原因如下：

consider a CSV file with 3 rows and 3 MPI processes. 考虑具有3行和3个MPI流程的CSV文件。

One row is aa,b,c (8 bytes). 一行是aa,b,c （8个字节）。
row is aaaaaaa,bbbbbbb,ccccccc (24 bytes). 行是aaaaaaa,bbbbbbb,ccccccc （24个字节）。
third row is ,,c (4 bytes) . 第三行是,,c （4个字节）。

(darnit, markdown, how do I make this list start at zero?) （darnit，markdown，如何使该列表从零开始？）

Rank 0 can read from the beginning of the file, but where will rank 1 and 2 start? 可以从文件的开头读取等级0，但是等级1和2将从哪里开始？ If you simply divide total size (8+24+4=36) by 3, then the decomposistion is 如果您简单地将总大小（8 + 24 + 4 = 36）除以3，那么反分解就是

0 ends up reading aa,b,c\\naaaaaa , 0最终读取aa,b,c\\naaaaaa ，
1 reads a,bbbbbbb,ccc , and 1读取a,bbbbbbb,ccc和
reads cccc\\n,,c\\n 读取cccc\\n,,c\\n

The two approaches to unstructured text input are as follows. 非结构化文本输入的两种方法如下。 One option is to index your file, either after the fact or as the file is being generated. 一种选择是在事实之后或在生成文件时为文件建立索引。 This index would store the beginning offset of every row. 该索引将存储每行的起始偏移量。 Rank 0 reads the offset then broadcasts to everyone else. 等级0读取偏移量，然后广播给其他所有人。

The second option is to do this initial decomposition by file size, then fix up the splits. 第二种选择是按文件大小进行初始分解，然后修复拆分。 In the above simple example, rank 0 would send everything after the newline to rank 1. Rank 1 would receive the new data and glue it to the beginning of its row and send everything after its own newline to rank 2. This is extremely fiddly and I would not suggest it for someone just starting MPI-IO. 在上面的简单示例中，等级0将在换行之后将所有内容发送到等级1。等级1将接收新数据并将其粘贴到其行的开头，并在其自己的换行之后发送所有内容以发送到等级2。我不建议刚开始MPI-IO的人使用它。

HDF5 is a good option here! HDF5是一个不错的选择！ Instead of trying to write your own parallel CSV parser, have your CSV creator generate an HDF5 dataset. 让您的CSV创建者生成HDF5数据集，而不是尝试编写自己的并行CSV解析器。 HDF5, among other features, will keep that index i mentioned for you, so you can set up hyperslabs and do parallel reading and writing. HDF5等功能将保留我为您提到的该索引，因此您可以设置超级平板并进行并行读写。

使用并行I / O复制大数据文件

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-07-30 14:35:43

使用并行I / O复制大数据文件

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-07-30 14:35:43

解决方案1
1 已采纳 2015-07-30 14:35:43