简体   繁体   中英

Translating strings character by character

How should I go about implementing a method that gets a String composed of Latin characters to translate it into a String composed of a different set of characters, let's say Cyrillic.

Here's how it's done in PHP for example:

function latin_to_cyrillic($string)
{
 $array = array(
  "а" => "a",
  "б" => "b",
  "в" => "v",
  "г" => "g",
  "д" => "d",
  "е" => "e",
  "ж" => "zh",
  "з" => "z",
  "и" => "i",
  "й" => "y",
  "к" => "k",
  "л" => "l",
  "м" => "m",
  "н" => "n",
  "о" => "o",
  "п" => "p",
  "р" => "r",
  "с" => "s",
  "т" => "t",
  "у" => "u",
  "ф" => "f",
  "х" => "h",
  "ц" => "ts",
  "ч" => "ch",
  "ш" => "sh",
  "щ" => "sht",
  "ь" => "y",
  "ъ" => "a",
  "ю" => "yu",
  "я" => "ya",
  "А" => "A",
  "Б" => "B",
  "В" => "V",
  "Г" => "G",
  "Д" => "D",
  "Е" => "E",
  "Ж" => "Zh",
  "З" => "Z",
  "И" => "I",
  "Й" => "Y",
  "К" => "K",
  "Л" => "L",
  "М" => "M",
  "Н" => "N",
  "О" => "O",
  "П" => "P",
  "Р" => "R",
  "С" => "S",
  "Т" => "T",
  "У" => "U",
  "Ф" => "F",
  "Х" => "H",
  "Ц" => "Ts",
  "Ч" => "Ch",
  "Ш" => "Sh",
  "Щ" => "Sht",
  "Ь" => "Y",
  "Ъ" => "A",
  "Ю" => "Yu",
  "Я" => "Ya",
  "–" => "-");

 return str_replace(array_values($array), array_keys($array), $string);

}

First of all you need a conversion table, defining the translation for every character.

Then you read the string char by char, and use the translation table to get the translation. Easy, right?

you can use something like this:

class Translator {
 HashMap<String,String> translation = new HashMap<String,String>();

 public Translator(){
  //Populate the translation table here;
 }

 public String translate(String origin){
  String destiny="";
  for(int i=0;i<origin.length();i++){
   char character = origin.charAt(i);
   destiny = destiny + translation.get(Character.toString(character));
  }
 return destiny;
 }
}

Alternatively you could use

replaceEach(String text, String[] searchList, String[] replacementList) 
           Replaces all occurrences of Strings within another String.

From org.apache.commons.lang.StringUtils . You could populate a String[] with the latin characters (but as String ), then populate another String[] with the cyrillic characters as String , and use that function.

String[] latinCharacters = [] //Populate them
String[] cyrillicCharacters = [] //Populate them

public String translate(String origin){
return replaceEach(origin,latinCharacters,cyrillicCharacters);
}

Easy variant: You call translateNameRusToEn method with your string

public class TranslateCharactersUtil {

public static final Map<Character, String> alphabetCharacters = new LinkedHashMap<>() {{
    put('а', "a"); put('б', "b"); put('в', "v"); put('г', "g"); put('д', "d");
    put('е', "e"); put('э', "e"); put('ё', "yo"); put('ж', "zh"); put('з', "z");
    put('и', "i"); put('й', "j"); put('к', "k"); put('л', "l"); put('м', "m");
    put('н', "n"); put('о', "o"); put('п', "p"); put('р', "r"); put('с', "s");
    put('т', "t"); put('у', "u"); put('ф', "f"); put('х', "h"); put('ц', "ts");
    put('ч', "ch"); put('ш', "sh"); put('щ', "sch"); put('ь', ""); put('ъ', "");
    put('ы', "y"); put('ю', "yu");put('я', "ya"); put(' ', "-"); put('.', "");
    put(',', "");
}};


public String translateNameRusToEn(String name) {
    StringBuilder stringBuilder = new StringBuilder();
    char[] chars = name.toLowerCase().toCharArray();
    for (char symbol : chars) {
      char[] newSymbols = getFromMap(symbol);
        for (char newSymbol : newSymbols) {
            stringBuilder.append(newSymbol);
        }
    }
    return stringBuilder.toString();
}

private char[] getFromMap(char symbol) {
   return alphabetCharacters.getOrDefault(symbol, "?").toCharArray();
}

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM