简体   繁体   English

将数据从一个集群复制到另一个集群时,Hadoop Distcp 中止

[英]Hadoop Distcp aborting when copying data from one cluster to another

I am trying to copy data of a partitioned Hive table from one cluster to another.我正在尝试将分区 Hive 表的数据从一个集群复制到另一个集群。 I am using distcp to copy the data but the data underlying data is of a partitioned hive table.我正在使用 distcp 复制数据,但底层数据的数据是分区的配置单元表。 I used the following command.我使用了以下命令。

hadoop distcp -i {src} {tgt}

But as the table was partitioned the directory structure was created according to the partitioned tables.但是当表被分区时,目录结构是根据分区表创建的。 So it is showing error creating duplicates and aborting job.所以它显示创建重复项和中止作业的错误。

org.apache.hadoop.toolsCopyListing$DulicateFileException: File would cause duplicates. org.apache.hadoop.toolsCopyListing$DulicateFileException:文件会导致重复。 Aborting中止

I also used -skipcrccheck -update -overwrite but none worked.我还使用了-skipcrccheck -update -overwrite但没有奏效。

How to copy the data of a table from partitioned file path to destination?如何将表的数据从分区文件路径复制到目标?

尝试使用此选项-strategy dynamic默认情况下,distcp 使用uniformsize。

Check the below settings to see if they are false.Set them to true.检查以下设置以查看它们是否为 false。将它们设置为 true。

hive> set hive.mapred.supports.subdirectories;
hive.mapred.supports.subdirectories=false
hive> set mapreduce.input.fileinputformat.input.dir.recursive;
mapreduce.input.fileinputformat.input.dir.recursive=false

hadoop distcp -Dmapreduce.map.memory.mb=20480 -Dmapreduce.map.java.opts=-Xmx15360m -Dipc.client.fallback-to-simple-auth-allowed=true -Ddfs.checksum.type=CRC32C -m 500 \\ -pb -update -delete {src} {target}

Ideally there can't be same file names.理想情况下不能有相同的文件名。 So, what's happening in your case is you trying to copy partitioned table from one cluster to other.因此,在您的情况下发生的情况是您试图将分区表从一个集群复制到另一个集群。 And, 2 different named partitions have same file name.并且,2 个不同的命名分区具有相同的文件名。

Your solution is to correct Source path {src} in your command, such that you provide path uptil partitioned sub directory not the file.您的解决方案是在您的命令中更正源路径{src} ,以便您提供路径 uptil 分区子目录而不是文件。

For ex - Refer below :例如 - 请参阅以下内容:

/a/partcol=1/file1.txt
/a/partcol=2/file1.txt

If you use {src} as "/a/*/*" then you will get the error "File would cause duplicates."如果您将{src}用作"/a/*/*"那么您将收到错误"File would cause duplicates."

But, if you use {src} as "/a" then you will not get error in copying.但是,如果您将{src}用作"/a"则复制时不会出错。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM