简体   繁体   English

Hadoop copyFromLocal的内存不足问题

[英]Out of memory issue for Hadoop copyFromLocal

I'm trying to copy a directory that contains 1,048,578 files into hdfs file system but, got below error: 我试图将包含1,048,578个文件的目录复制到hdfs文件系统中,但是出现以下错误:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2367)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
    at java.lang.StringBuffer.append(StringBuffer.java:237)
    at java.net.URI.appendSchemeSpecificPart(URI.java:1892)
    at java.net.URI.toString(URI.java:1922)
    at java.net.URI.<init>(URI.java:749)
    at org.apache.hadoop.fs.shell.PathData.stringToUri(PathData.java:565)
    at org.apache.hadoop.fs.shell.PathData.<init>(PathData.java:151)
    at org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:273)
    at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
    at org.apache.hadoop.fs.shell.CommandWithDestination.recursePath(CommandWithDestination.java:291)
    at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
    at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
    at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:243)
    at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
    at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
    at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220)
    at org.apache.hadoop.fs.shell.CopyCommands$Put.processArguments(CopyCommands.java:267)
    at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
    at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
    at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)

Issue was basically with Hadoop client. 问题基本上与Hadoop客户端有关。 This is fixed by increasing "GCOverheadLimit" to 4GB. 通过将“ GCOverheadLimit”增加到4GB,可以解决此问题。 Following command solved my problem. 以下命令解决了我的问题。

export HADOOP_CLIENT_OPTS="-XX:-UseGCOverheadLimit -Xmx4096m" export HADOOP_CLIENT_OPTS =“-XX:-UseGCOverheadLimit -Xmx4096m”

Try giving your put (or copy from local) command more heap space. 尝试给put(或从本地复制)命令更多的堆空间。 Alternatively, do a less aggressive put operation. 或者,进行较不积极的推杆操作。

Ie copy in batches of half or 1/4th or 1/5 .... of the total data. 即分批复制总数据的一半或1/4或1/5....。 All this copying is done from the local machine with a default java command, you are simply overloading it. 所有这些复制都是使用默认的java命令从本地计算机完成的,您只是在重载它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM