简体   繁体   English

Hadoop(HDFS) - 文件版本控制

[英]Hadoop (HDFS) - file versioning

At the given time I have user file system in my application (apache CMIS). 在给定时间,我的应用程序(apache CMIS)中有用户文件系统。 As it's growing bigger, I'm doubting to move to hadoop (HDFS) as we need to run some statistics on it as well. 随着它越来越大,我怀疑是否要转向hadoop(HDFS),因为我们还需要对它进行一些统计。 The problem: The current file system provides versioning of the files. 问题:当前文件系统提供文件的版本控制。 When I read about hadoop - HDFS- and file versioning, I found most of the time that I have to write this (versioning) layer myself. 当我读到hadoop - HDFS-和文件版本控制时,我发现大多数时候我必须自己编写这个(版本控制)层。 Is there already something available to manage versioning of files in HDFS or do I really have to write it myself (don't want to reinvent the hot water, but don't find a proper solution either). 是否已有可用于管理HDFS中文件版本的东西,或者我是否真的必须自己编写(不想重新发明热水,但也找不到合适的解决方案)。

Answer 回答

For full details: see comments on answer(s) below 有关详细信息:请参阅下面的答案评论

Hadoop (HDFS) doesn't support versioning of files. Hadoop(HDFS)不支持文件版本控制。 You can get this functionality when you combine hadoop with (amazon) S3: Hadoop will use S3 as the filesystem (without chuncks, but recovery will be provided by S3). 当您将hadoop与(amazon)S3结合使用时,您可以获得此功能:Hadoop将使用S3作为文件系统(没有chuncks,但S3将提供恢复)。 This solution comes with the versioning of files that S3 provides. 此解决方案附带S3提供的文件版本控制。 Hadoop will still use YARN for the distributed processing. Hadoop仍将使用YARN进行分布式处理。

Versioning is not possible with HDFS. HDFS无法进行版本控制。
Instead you can use Amazon S3 , which provides Versioning and is also compatible with Hadoop. 相反,您可以使用Amazon S3 ,它提供版本控制并且还与Hadoop 兼容

HDFS supports snapshots. HDFS支持快照。 I think that's as close as you can get to "versioning" with HDFS. 我认为这与使用HDFS进行“版本控制”的情况一样接近。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM