简体   繁体   中英

Weird Issue reading to a TreeSet

I seem to be having a weird issue where it is faster to read a file into an ArrayList and from that ArrayList read it to a TreeSet than to add the data directly to the TreeSet . I can't seem to understand the problem.

public TreeSet<String> readFile(){
    TreeSet<String> dict = null;
    try {
        dict = new TreeSet<String>();
        BufferedReader in = new BufferedReader(new InputStreamReader(getAssets().open("dictionary")));
        String line;

        while ((line = in.readLine()) != null) {
            line = line.split(SEPARATOR)[0];
            dict.add(line);
        }

    }catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
    }

    return dict;
}

Also this problem seems to be related with the split function since it works at normal speed without it.
My input file haves around 160 000 lines.
ArrayList with TreeSet takes around 2000 ms.
TreeSet takes around 100 000 ms.

ArrayList -> TreeSet Code

public TreeSet<String> readFile(){
    ArrayList<String> dict = null;
    try {
        dict = new ArrayList<String>();
        BufferedReader in = new BufferedReader(new InputStreamReader(getAssets().open("dictionary")));
        String line;
        while ((line = in.readLine()) != null) {
            line = line.split(SEPARATOR)[0];
            dict.add(line);
        }
    }catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
    TreeSet<String> tree = new TreeSet<String>();
    for(String word:dict){
        tree.add(word);
    }
    return tree;
}

Currently using OnePlus One with Cyanogenmod for the tests.

TreeSet uses Comparable defined over String and will try to do sorting n times --> size of the Strings that you are going to add.

ArrayList just adds as per index and doesnt have any background operation running over.

When once it reaches all TreeSet has to sort as per the defined rules.

Defined here : API

Costs guaranteed log(n) for basic operations

I guess you are reading a file that is already sorted . Immediately inserting then would tend to create a linear list, or require continuously rebalancing the tree to prevent this.

TreeSet.addAll(Collection) first sorts (relatively fast for a sorted list), and then uses an optimized algorithm knowing that the elements are sorted to build a (balanced) tree.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM