简体   繁体   English

Hadoop:使用Filesystem.append()在Map阶段在HDFS中更新文件时进行同步?

[英]Hadoop: synchronization while updating file in HDFS in the Map stage using Filesystem.append()?

I'm wondering if Hadoop has any sort of synchronization protection when multiple nodes try accessing the same file on HDFS using the FileSystem append(Path p) method? 我想知道当多个节点尝试使用FileSystem append(Path p)方法访问HDFS上的同一文件时,Hadoop是否具有某种同步保护?

I append values to a text file in the mapper stages of my jobs and I was wondering what would happen if two mappers tried accessing the same file at the same time? 我在作业的映射器阶段将值附加到文本文件中,我想知道如果两个映射器尝试同时访问同一文件会发生什么? I don't want them to overwrite each other, preferably I would like nodes to wait to gain access to the file until others are done with it so that only one node opens the file at any one time. 我不希望它们互相覆盖,最好让节点等待访问该文件,直到其他文件处理完毕,以便在任何时候只有一个节点打开文件。

        BufferedWriter br=new BufferedWriter(new OutputStreamWriter(fs.append(new Path(tempFilePath))));
        br.append("value");
        br.close();

I know this isn't proper MapReduce but for some of the jobs I am running I have no other choice but to do this as I need to store some text values independent from the final output. 我知道这不是正确的MapReduce,但是对于我正在运行的某些作业,我别无选择,只能这样做,因为我需要存储一些与最终输出无关的文本值。

According to the FAQ : 根据常见问题解答

HDFS supports exclusive writes only. HDFS仅支持独占写入。

When the first client contacts the name-node to open the file for writing, the name-node grants a lease to the client to create this file. 当第一个客户端与名称节点联系以打开文件进行写入时,名称节点将租约授予客户端以创建此文件。 When the second client tries to open the same file for writing, the name-node will see that the lease for the file is already granted to another client, and will reject the open request for the second client . 当第二个客户端尝试打开相同的文件进行写入时,名称节点将看到该文件的租约已被授予另一个客户端,并且将拒绝第二个客户端的打开请求

I don't know fs.append is blocking or not, but in your case the best solution is using MultipleOutputs ( documentation ). 我不知道fs.append是否被阻止,但是在您的情况下,最好的解决方案是使用MultipleOutputs文档 )。 This will allow you to write data independent from the final output. 这将使您可以独立于最终输出写入数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM