简体   繁体   English

如何在Java中实现unicode感知选择排序算法

[英]How to implement unicode aware selection sort algorithm in Java

I study selection sort algorithm. 我研究选择排序算法。 Consider this implementation: 考虑以下实现:

http://algs4.cs.princeton.edu/21elementary/Selection.java.html http://algs4.cs.princeton.edu/21elementary/Selection.java.html

I have a text file which has unicode words like that. 我有一个文本文件,其中包含类似unicode的单词。

$ more words.txt
şeftali içel ırak üzüm uzun çorba çimen ufuk

When I run the program it doesn't sorts unicode characters correctly. 当我运行程序时,它不能正确地对Unicode字符进行排序。

$ java-algs4 Selection < words.txt
içel
ufuk
uzun
çimen
çorba
üzüm
ırak
şeftali

My first attempt was to use a collator. 我的第一次尝试是使用整理器。

import java.util.*;
import java.text.*;

public class StringSorter
{
    public static void sortStrings(Collator c, String[] words)
    {
        String tmp;

        for (int i = 0; i < words.length; ++i)
        {
            for (int j = 0; j < words.length; ++j)
            {
                if (c.compare(words[i], words[j]) < 0)
                {
                    tmp = words[i];
                    words[i] = words[j];
                    words[j] = tmp;
                }
            }
        }
    }

    public static void printStrings(String[] words)
    {
        for (int i = 0; i < words.length; ++i)
        {
            System.out.println(words[i]);
        }
    }

    public static void main(String[] args)
    {
        Collator tr_TRCollator = Collator.getInstance(new Locale("tr", "TR"));

        String[] words = {"şeftali", "içel", "ırak", "üzüm", "uzun", "çorba", "çimen", "ufuk"};
        sortStrings(tr_TRCollator, words);
        printStrings(words);
    }

}

This program sorts words correctly as expected. 该程序可以按预期正确地对单词进行排序。

$ java-algs4 StringSorter
çimen
çorba
ırak
içel
şeftali
ufuk
uzun
üzüm

My question is how should we implement unicode aware selection sort algorithm in Java? 我的问题是我们应该如何在Java中实现具有unicode意识的选择排序算法?

Also Selection.class sort method takes a Comparator object as a second parameter. Selection.class排序方法也将Comparator对象作为第二个参数。 Is it possible to write our own implementation of Comparator interface so that it should be able to sort unicode elements correctly. 是否可以编写我们自己的Comparator接口实现,以便它应该能够正确地对unicode元素进行排序。

 public static void sort(Object[] a, Comparator c)

Any help would be appreciated. 任何帮助,将不胜感激。 Thanks.. 谢谢..

Collator类实现Comparator接口,因此您可以将tr_TRCollator传递给Selection.sort作为第二个参数。

You could Normalize the Strings and do a unicode comparison if they other wise match. 您可以对字符串进行归一化,如果其他匹配,则可以进行unicode比较。

String[] words = "şeftali içel ırak üzüm uzun çorba çimen ufuk".split(" ");
Arrays.sort(words, Comparator.comparing((String w) -> 
                                        Normalizer.normalize(w, Normalizer.Form.NFD))
                             .thenComparing(Comparator.naturalOrder()));
Stream.of(words).forEach(System.out::println);

prints 版画

çimen
çorba
içel
şeftali
ufuk
uzun
üzüm
ırak

This is close but it doesn't consider ı to be like i 这很接近,但它并不认为ıi

The important point is that your 2nd example uses locale settings. 重要的是您的第二个示例使用区域设置。 Sort order for strings is locale dependent and has nothing to do with the unicode codepoints of the characters. 字符串的排序顺序取决于语言环境,并且与字符的unicode代码点无关。 Even countries that use the same language, say Austria, Germany and Switzerland have subtle differences in string sorting order. 即使是使用相同语言的国家/地区,例如奥地利,德国和瑞士,在字符串排序顺序方面也存在细微的差异。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM