简体   繁体   English

两个远程集群之间的DistCp容错

[英]DistCp fault tolerance between two remote clusters

I need to copy a directory of from one cluster to another with similar HDFS (both are MAPR clusters). 我需要将具有类似HDFS的目录从一个群集复制到另一个群集(两者都是MAPR群集)。

I am planed to use DistCp Java API. 我计划使用DistCp Java API。 But I wanted to avoid duplicate copies of files in the directory. 但我想避免在目录中重复文件副本。 I wanted to know whether these operations are fault tolerant? 我想知道这些操作是否具有容错能力? Ie if the files are not copied completely due to loss of connection, if the DistCp initiates the copies again to copy a file properly? 即如果由于连接丢失而没有完全复制文件,如果DistCp再次启动副本以正确复制文件?

distcp uses MapReduce to effect its distribution, error handling and recovery, and reporting. distcp使用MapReduce来实现其分发,错误处理和恢复以及报告。

Please see Update and Overwrite 请参阅更新和覆盖

You can use -overwrite option to avoid duplicates Moreover, you can check update option as well. 您可以使用-overwrite选项来避免重复。此外,您还可以检查更新选项。 If network connection fails, once its connection recovered then you can re-initiate with overwrite option 如果网络连接失败,一旦连接恢复,您就可以使用覆盖选项重新启动

See the examples of -update and -overwrite as mentioned in above guide link. 请参阅上面的指南链接中提到的-update和-overwrite示例。

Here is the link for refactored distcp: https://hadoop.apache.org/docs/r2.7.2/hadoop-distcp/DistCp.html 以下是重构的distcp的链接: https ://hadoop.apache.org/docs/r2.7.2/hadoop-distcp/DistCp.html

As "@RamPrasad G" mentioned, I guess you have no option other than redo the distcp in case of network failure. 正如“@RamPrasad G”所提到的,我猜你除了在网络故障的情况下重做distcp之外别无选择。

Some good reads: 一些好的读物:

Hadoop distcp network failures with WebHDFS 使用WebHDFS的Hadoop distcp网络故障

http://www.ghostar.org/2015/08/hadoop-distcp-network-failures-with-webhdfs/ http://www.ghostar.org/2015/08/hadoop-distcp-network-failures-with-webhdfs/

Distcp between two HA Cluster 两个HA集群之间的Distcp

http://henning.kropponline.de/2015/03/15/distcp-two-ha-cluster/ http://henning.kropponline.de/2015/03/15/distcp-two-ha-cluster/

Transferring Data to/from Altiscale via S3 using DistCp 使用DistCp通过S3将数据传输到Altiscale或从Altiscale传输数据

https://documentation.altiscale.com/transferring-data-using-distcp This page has a link for a shell script with retry, which could be helpful to you. https://documentation.altiscale.com/transferring-data-using-distcp此页面包含一个带有重试的shell脚本的链接,这可能对您有所帮助。

Note: Thanks to original authors. 注意:感谢原创作者。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM