简体   繁体   中英

Is it legitimate to change a String to Int so that I can sort it better?

I have many Strings, for example ("a32ghS:SAD") and I need to sort them. Is it okay to get a integer value like this:

String s = "a32ghS:SAD";
int l = 0;
for (int i = 0; i < s.length(); i++) {
    l += (int) s.charAt(i);
}

Is it okay to sort the Strings based on the integer l? Or should I sort them based on it's String?

Much depend on what you want to do. :)

however, if you sort it based on string you'll come up performing O(NlogN) string2int conversions. Instead, if you convert your strings before sorting, you'll drop to only O(N) conversions.

Simply adding up the character values of each character will sort it incorrectly (assuming you want alphabetical). Consider the string "aZZZZ" , this will come after "b" with your code sample. You method will sort the strings by the sum of the character codes of the characters contained in the strings , not particularly useful.

Assuming you want to sort alphabetically you should do it using the Java library method Collections.sort as the code is already written to do this.

ArrayList<String> list = new ArrayList<String>();

unsortList.add("cc");
unsortList.add("bb");
unsortList.add("dd");
unsortList.add("aa");

Collections.sort(list);

The way the typical alphabetical sort works is by comparing the ASCII character codes in a the first position and ordering them that way, if the characters are the same then the next character is considered and so on.

You won't be able to beat this sort of performance unless you are sorting a particular way or you can exploit some knowledge about the strings that you know.

That would make "a32ghS:SAD" and "S32gha:SAD" have the same integer representation. Plus you would have troubles converting back integers to strings (you'd have to use some map structure).

So, the answer is just sort the strings, it's not that it's really slow operation (it depends on number of items, of course).

no, because the position in the string matters (see answers above for that) but if you know the maximum length of your string, and if you do a bitwise shift on it after adding a character, it might be ok.

Keep in mind that String.compareTo is using the unicode values of each character in much the same way, but the compareTo method by default is case-sensitive.

In the Cassandra database, they do something like that by default. However, to compute the integer, they compute a hash using murmur3. A hash is similar to your simple sum, but you are not likely to find two strings with the same hash (they exist, it's just rare).

In that case, it is useful because you compute the hash once and search possibly millions of rows. It makes it really fast because the hash allows for the search to be sharded (ie if you have 201 computers and use groups of 3 computers to save your data [for replication], then a database searching 10,000,000 rows means searching about 149,253 on one of these little clusters).

Note that as a result the strings are not sorted alphabetically.

Now, to sort strings in memory, you probably want to just use a sort() with the strings themselves as the key. The time to compute the hash, store it, the extra memory it uses, you're not likely to save anything. A standard sort will use a binary search, so that's a maximum of 10 to 11 iterations for 1,000,000 strings. It will be fast.

In Java, if you need to attach data to the string, use a Map . If you do not need any data, use a SortedSet .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM