简体   繁体   中英

Arrays.sort() vs sorting using map

I have a requirement where I have to loop through an array which has list of strings:

String[] arr = {"abc","cda","cka","snd"}

and match the string "bca" , ignoring the order of the characters, which will return true as it's present in the array ( "abc" ).

To solve this I have two approaches:

  1. Use Arrays.sort() to sort both the strings and then use Arrays.equals to compare them.
  2. create 2 hashmaps and add frequency of each letter in string and then finally compare two map of char using equals method.

I read that complexity of using Arrays.sort() method is more. So, thought of working on 2nd approach but when I am running both the code 1st approach is taking very less time to execute program.

Any suggestions why this is happening?

The Time Complexity only tells you, how the approach will scale with (significantly) larger input. It doesn't tell you which approach is faster.

It's perfectly possible that a solution is faster for small input sizes (string lengths and/or array length) but scales badly for larger sizes, due to its Time Complexity. But it's even possible that you never encounter the point where an algorithm with a better Time Complexity becomes faster, when natural limits to the input sizes prevent it.

You didn't show the code of your approaches, but it's likely that your first approach calls a method like toCharArray() on the strings, followed by Arrays.sort(char[]) . This implies that sort operates on primitive data.

In contrast, when your second approach uses a HashMap<Character,Integer> to record frequencies, it will be subject to boxing overhead, for the characters and the counts, and also use a significantly larger data structure that needs to be processed.

So it's not surprising that the hash approach is slower for small strings and arrays, as it has a significantly larger fixed overhead and also a size dependent ( O(n) ) overhead.

So first approach had to suffer from the O(n log n) time complexity significantly to turn this result. But this won't happen. That time complexity is a worst case of sorting in general. As explained in this answer , the algorithms specified in the documentation of Arrays.sort should not be taken for granted. When you call Arrays.sort(char[]) and the array size crosses a certain threshold, the implementation will turn to Counting Sort with an O(n) time complexity (but use more memory temporarily).

So even with large strings, you won't suffer from a worse time complexity. In fact, the Counting Sort shares similarities with the frequency map, but usually is more efficient, as it avoids the boxing overhead, using an int[] array instead of a HashMap<Character,Integer> .

Approach 1: will be O(NlogN)

Approach 2: will be O(N*M), where M is the length of each string in your array.

You should search linearly in O(N):

for (String str : arr) {
    if (str.equals(target)) return true;
}
return false;

Let's decompose the problem:

You need a function to sort a string by its chars ( bccabc -> abbccc ) to be able to compare a given string with the existing ones.

Function<String, String> sortChars = s -> s.chars()
        .sorted()
        .mapToObj(i -> (char) i)
        .map(String::valueOf)
        .collect(Collectors.joining());

Instead of sorting the chars of the given strings anytime you compare them, you can precompute the set of unique tokens (values from your array, sorted chars):

Set<String> tokens = Arrays.stream(arr)
        .map(sortChars)
        .collect(Collectors.toSet());

This will result in the values "abc","acd","ack","dns" .

Afterwards you can create a function which checks if a given string , when sorted by chars, matches any of the given tokens :

Predicate<String> match = s -> tokens.contains(sortChars.apply(s));

Now you can easily check any given string as follows:

boolean matches = match.test("bca");

Matching will only need to sort the given input and do a hash set lookup to check if it matches, so it's very efficient .

You can of course write the Function and Predicate as methods instead ( String sortChars(String s) and boolean matches(String s) if you're unfamiliar with functional programming.

More of an addendum to the other answers. Of course, your two options have different performance characteristics. But: understand that performance is not necessarily the only factor to make a decision!

Meaning: if you are talking about a search that runs hundreds or thousands of time per minute, on large data sets: then for sure, you should invest a lot of time to come up with a solution that gives you best performance. Most likely, that includes doing various experiments with actual measurements when processing real data. Time complexity is a theoretical construct, in the real world, there are also elements such as CPU cache sizes, threading issues, IO bottlenecks, and whatnot that can have significant impact on real numbers.

But: when your code will doing its work just once a minute, even on a few dozen or hundred MB of data... then it might not be worth to focus on performance.

In other words: the "sort" solution sounds straight forward. It is easy to understand, easy to implement, and hard to get wrong (with some decent test cases). If that solution gets the job done "good enough", then consider to use use that: the simple solution.

Performance is a luxury problem. You only address it if there is a reason to.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM