简体   繁体   English

单词列表生成器,Java中的堆大小错误

[英]Wordlist generator, heap size error in Java

I'm trying to create a program which generates a word list based on a couple (10-100) original input words. 我正在尝试创建一个程序,该程序基于几个(10-100)个原始输入单词来生成单词列表。 The end result contains millions, possibly billions of lines, with one word on each line. 最终结果包含数百万行,可能数十亿行,每行一个字。 I've come far enough that I can generate up to about 5 million or so words, but whenever I run something that would generate far more words, like 100 million or so, the program crashes after roughly 1 min and 9 seconds. 我已经走了足够远的距离,可以生成大约500万个左右的单词,但是每当我运行将生成更多词的东西(例如1亿左右)时,该程序在大约1分9秒后崩溃。 Here is the error output: 这是错误输出:

    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3210)
    at java.util.Arrays.copyOf(Arrays.java:3181)
    at java.util.ArrayList.grow(ArrayList.java:265)
    at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:239)
    at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:231)
    at java.util.ArrayList.add(ArrayList.java:462)
    at wordlistgen.WordlistGen2.combineWords(WordlistGen2.java:129)
    at wordlistgen.WordlistGen2.main(WordlistGen2.java:25)
    /home/NAME/.cache/netbeans/8.1/executor-snippets/run.xml:53: Java 
returned: 1
BUILD FAILED (total time: 1 minute 9 seconds)

I have tried to increase the heap size for Netbeans by entering -J-Xms1024m -J-Xmx2048m in my netbeans.conf file (Running Ubuntu 17.10), but the error persists. 我试图通过在netbeans.conf文件(运行Ubuntu 17.10)中输入-J-Xms1024m -J-Xmx2048m来增加Netbeans的堆大小,但是错误仍然存​​在。

Essentially what the program does is import the original 10-100 words: 该程序实质上是导入原始的10-100个字:

static void importList() throws IOException{
    ArrayList<String> rawList = new ArrayList<>();

    try(BufferedReader br = new BufferedReader(new FileReader("textfile"))) {
        for(String line; (line = br.readLine()) != null; ) {
            rawList.add(line);
        }

        listOfLists.add(rawList);
        loll++;
    }

}

Then, with a bunch of for loops I create new variations of words with capitalized letters, numbers at the end, substrings of the entire word, and so on. 然后,使用一堆for循环,用大写字母,末尾数字,整个单词的子字符串等创建单词的新变体。 The words are stored in different arraylists, which are in turn stored in an ArrayList of ArrayLists. 单词存储在不同的数组列表中,而数组列表又存储在ArrayLists的ArrayList中。 So in an ArrayList. 因此在ArrayList中。

When I'm done combining and manipulating words, I output the entire final arraylist, line by line, to an output file, using the following method: 完成组合和操作单词后,使用以下方法将整个最终的arraylist逐行输出到输出文件:

static void outputFile(String fileName) throws IOException{
    try (FileWriter writer = new FileWriter(fileName)) {
        for(String str: finalList) {
            writer.write(str +"\n");
        }
    }
}

The entire code can be found here: https://pastebin.com/0fkvwYbx 完整的代码可以在这里找到: https : //pastebin.com/0fkvwYbx

I'm hoping that I'm missing something obvious, or that I've misinterpreted the error message, either way, if someone could find a solution so that I am able to generate longer lists, I'd be very grateful. 我希望我遗漏了一些明显的东西,或者我错误地解释了错误消息,无论哪种方式,如果有人可以找到解决方案以便我能够生成更长的列表,我将非常感谢。

Maybe ArrayList is not the appropiate List implementation for your problem. 也许ArrayList不是您问题的适当List实现。 Please see: When to use LinkedList over ArrayList? 请参阅: 何时在ArrayList上使用LinkedList?

I think you are constantly hitting the worst-case scenario when (citing) 我认为您在(引用)时经常遇到最坏的情况

add(E element) is O(1) amortized, but O(n) worst-case since the array must be resized and copied add(E element)被分摊为O(1),但O(n)最坏的情况是因为必须调整数组大小并复制它

Not only inefficient in time, but also in memory, since you are constantly needing duplicated huge backing arrays for your ArrayLists. 由于您不断需要ArrayLists的重复大型后备阵列,因此不仅效率低下,而且内存效率低下。 Consider using LinkedList, specially since your code does not appear to do random access by index to the lists 考虑使用LinkedList,特别是因为您的代码似乎没有通过索引对列表进行随机访问

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM