简体   繁体   English

奇怪的问题读到树集

[英]Weird Issue reading to a TreeSet

I seem to be having a weird issue where it is faster to read a file into an ArrayList and from that ArrayList read it to a TreeSet than to add the data directly to the TreeSet . 我似乎有一个奇怪的问题,即它是更快读取一个文件到一个ArrayList ,并从ArrayList读取到一个TreeSet ,而不是直接将数据添加到TreeSet I can't seem to understand the problem. 我似乎无法理解问题。

public TreeSet<String> readFile(){
    TreeSet<String> dict = null;
    try {
        dict = new TreeSet<String>();
        BufferedReader in = new BufferedReader(new InputStreamReader(getAssets().open("dictionary")));
        String line;

        while ((line = in.readLine()) != null) {
            line = line.split(SEPARATOR)[0];
            dict.add(line);
        }

    }catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
    }

    return dict;
}

Also this problem seems to be related with the split function since it works at normal speed without it. 同样,此问题似乎与拆分功能有关,因为如果没有拆分功能,它会以正常速度运行。
My input file haves around 160 000 lines. 我的输入文件大约有16万行。
ArrayList with TreeSet takes around 2000 ms. 带有TreeSet的ArrayList大约需要2000毫秒。
TreeSet takes around 100 000 ms. TreeSet需要大约10万毫秒。

ArrayList -> TreeSet Code ArrayList-> TreeSet代码

public TreeSet<String> readFile(){
    ArrayList<String> dict = null;
    try {
        dict = new ArrayList<String>();
        BufferedReader in = new BufferedReader(new InputStreamReader(getAssets().open("dictionary")));
        String line;
        while ((line = in.readLine()) != null) {
            line = line.split(SEPARATOR)[0];
            dict.add(line);
        }
    }catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
    TreeSet<String> tree = new TreeSet<String>();
    for(String word:dict){
        tree.add(word);
    }
    return tree;
}

Currently using OnePlus One with Cyanogenmod for the tests. 当前使用OnePlus One和Cyanogenmod进行测试。

TreeSet uses Comparable defined over String and will try to do sorting n times --> size of the Strings that you are going to add. TreeSet使用在String定义的Comparable ,并将尝试进行n次排序->要添加的String的大小。

ArrayList just adds as per index and doesnt have any background operation running over. ArrayList仅按索引添加,并且没有任何后台操作在运行。

When once it reaches all TreeSet has to sort as per the defined rules. 一旦到达,所有TreeSet都必须按照定义的规则进行排序。

Defined here : API 此处定义: API

Costs guaranteed log(n) for basic operations

I guess you are reading a file that is already sorted . 我猜您正在读取已排序的文件。 Immediately inserting then would tend to create a linear list, or require continuously rebalancing the tree to prevent this. 然后立即插入将倾向于创建线性列表,或者需要连续重新平衡树以防止这种情况。

TreeSet.addAll(Collection) first sorts (relatively fast for a sorted list), and then uses an optimized algorithm knowing that the elements are sorted to build a (balanced) tree. TreeSet.addAll(Collection)首先进行排序(对于排序后的列表而言相对较快),然后使用优化的算法,知道对元素进行排序以构建(平衡)树。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM