简体   繁体   中英

Transliteration from Hindi to English on Android without using Google API

I want to transliterate Hindi text into English in following way

Hindi - " आपका स्वागत है "

to

English - " aapka swagat hain "

I don't want to use Google's Translate API or any other translate API. If I use them it will end up giving me the translated version of my hindi text which is " Welcome ".

Is there any Transliterate library which I can use in my Android code?

I've heard of ICU but no luck in finding the procedure to use it in my code.

One potential way to solve the problem is to break it into two solvable ones and marry the two.

There are Hindi readers which can read Hindi devanagari script.

There are also dictating engines that phonetically transcribe in English.

Eg when someone leaves a message on my Vonage phone line in Gujarati, it records the audio, generates English text and emails me the wav file and the text. Mind you, when reading the text message it can occasionally be hilariously funny because Vonage assumes it is supposed to be in English, I am expecting the message to be in English, but after reading the message I realize it is Gujarati.

Google 'hindi reader for android' and 'phonetic transcription' for more info. If a Hindi reader could output a wav file which can be used as input to the transcription piece, then it may be a solution to your problem.

One possible and probably simple solution that i have found is that mapping of characters. So, i need to work on transliteration of the hindi words to english character, exact same situation as yours, but you can also work with other transliterations by using this approach. you just need to change the unicodes for the particular language.

Theory :- So, Converting Hindi or any other language's characters into English characters and vice versa, you first need to know about the range of the Unicodes for every characters of your language that you are transforming into English Characters.

For Example : To Convert Hindi Unicodes into English ASCII values, the range of the unicode is 0900-097F and this range varies as language changes. So, by using this unicodes you can map "How the particular unicode(Hindi Character) sounds", like ह will map to h in English Alphabet. So, this was the theory part.

Practical Approach : I need to create an app to get the Hindi voice input from user and convert it into English Alphabets. So, i have used STT(Speech to Text) Library and got hindi unicodes and from that hindi unicodes i am mapping that into English Characters.

Code :-

MainActivity.java

package android.example.com.conversion;

import android.content.ActivityNotFoundException;
import android.content.Intent;
import android.os.Bundle;
import android.speech.RecognizerIntent;
import android.support.v7.app.AppCompatActivity;
import android.support.v7.widget.AppCompatButton;
import android.view.View;
import android.widget.TextView;
import android.widget.Toast;

import java.util.ArrayList;

public class MainActivity extends AppCompatActivity implements View.OnClickListener {

    //  Record Button
    AppCompatButton RecordBtn;

    //  TextView to show Original and recognized Text
    TextView Original,result;

    // Request Code for STT
    private final int SST_REQUEST_CODE = 101;

    //  Conversion Table Object...
    ConversionTable conversionTable;

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        Original = findViewById(R.id.Original_Text);
        RecordBtn = findViewById(R.id.RecordBtn);
        result = findViewById(R.id.Recognized_Text);

        RecordBtn.setOnClickListener(this);
    }

    @Override
    public void onClick(View v) {
        switch (v.getId()) {
            case R.id.RecordBtn:
                Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);

                //  Use Off line Recognition Engine only...
                intent.putExtra(RecognizerIntent.EXTRA_PREFER_OFFLINE, false);

                //  Use Hindi Speech Recognition Model...
                intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE, "hi-IN");

                try {
                    startActivityForResult(intent, SST_REQUEST_CODE);
                } catch (ActivityNotFoundException a) {
                    Toast.makeText(getApplicationContext(),
                            getString(R.string.error),
                            Toast.LENGTH_SHORT).show();
                }

                break;
        }
    }

    @Override
    protected void onActivityResult(int requestCode, int resultCode, Intent data) {
        super.onActivityResult(requestCode, resultCode, data);
        switch (requestCode) {
            case SST_REQUEST_CODE:
                if (resultCode == RESULT_OK && null != data) {
                    ArrayList<String> getResult = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);
                    Original.setText(getResult.get(0));
                    conversionTable = new ConversionTable();
                    String Transformed_String = conversionTable.transform(getResult.get(0));
                    result.setText(Transformed_String);
                }
                break;
        }
    }
}

My ConversationTable.java class: this will map the values into English Alphabets.

package android.example.com.conversion;

import android.util.Log;

import java.util.ArrayList;
import java.util.Hashtable;

public class ConversionTable
{
    private String TAG = "Conversation Table";

    private Hashtable<String,String> unicode;

    private void populateHashTable()
    {
        unicode = new Hashtable<>();

        // unicode
        unicode.put("\u0901","rha"); // anunAsika - cchandra bindu, using ~ to // *
        unicode.put("\u0902","n"); // anusvara
        unicode.put("\u0903","ah"); // visarga

        unicode.put("\u0940","ee");
        unicode.put("\u0941","u");
        unicode.put("\u0942","oo");
        unicode.put("\u0943","rhi");
        unicode.put("\u0944","rhee");   //  * = Doubtful Case
        unicode.put("\u0945","e");
        unicode.put("\u0946","e");
        unicode.put("\u0947","e");
        unicode.put("\u0948","ai");
        unicode.put("\u0949","o");
        unicode.put("\u094a","o");
        unicode.put("\u094b","o");
        unicode.put("\u094c","au");

        unicode.put("\u094d","");
        unicode.put("\u0950","om");

        unicode.put("\u0958","k");
        unicode.put("\u0959","kh");
        unicode.put("\u095a","gh");
        unicode.put("\u095b","z");
        unicode.put("\u095c","dh");    // *
        unicode.put("\u095d","rh");
        unicode.put("\u095e","f");

        unicode.put("\u095f","y");
        unicode.put("\u0960","ri");
        unicode.put("\u0961","lri");
        unicode.put("\u0962","lr");       //  *
        unicode.put("\u0963","lree");     //  *

        unicode.put("\u093E","aa");
        unicode.put("\u093F","i");

        //  Vowels and Consonants...
        unicode.put("\u0905","a");
        unicode.put("\u0906","a");
        unicode.put("\u0907","i");
        unicode.put("\u0908","ee");
        unicode.put("\u0909","u");
        unicode.put("\u090a","oo");
        unicode.put("\u090b","ri");
        unicode.put("\u090c","lri"); // *
        unicode.put("\u090d","e"); // *
        unicode.put("\u090e","e"); // *
        unicode.put("\u090f","e");
        unicode.put("\u0910","ai");
        unicode.put("\u0911","o");
        unicode.put("\u0912","o");
        unicode.put("\u0913","o");
        unicode.put("\u0914","au");

        unicode.put("\u0915","k");
        unicode.put("\u0916","kh");
        unicode.put("\u0917","g");
        unicode.put("\u0918","gh");
        unicode.put("\u0919","ng");
        unicode.put("\u091a","ch");
        unicode.put("\u091b","chh");
        unicode.put("\u091c","j");
        unicode.put("\u091d","jh");
        unicode.put("\u091e","ny");
        unicode.put("\u091f","t"); // Ta as in Tom
        unicode.put("\u0920","th");
        unicode.put("\u0921","d"); // Da as in David
        unicode.put("\u0922","dh");
        unicode.put("\u0923","n");
        unicode.put("\u0924","t"); // ta as in tamasha
        unicode.put("\u0925","th"); // tha as in thanks
        unicode.put("\u0926","d"); // da as in darvaaza
        unicode.put("\u0927","dh"); // dha as in dhanusha
        unicode.put("\u0928","n");
        unicode.put("\u0929","nn");
        unicode.put("\u092a","p");
        unicode.put("\u092b","ph");
        unicode.put("\u092c","b");
        unicode.put("\u092d","bh");
        unicode.put("\u092e","m");
        unicode.put("\u092f","y");
        unicode.put("\u0930","r");
        unicode.put("\u0931","rr");
        unicode.put("\u0932","l");
        unicode.put("\u0933","ll"); // the Marathi and Vedic 'L'
        unicode.put("\u0934","lll"); // the Marathi and Vedic 'L'
        unicode.put("\u0935","v");
        unicode.put("\u0936","sh");
        unicode.put("\u0937","ss");
        unicode.put("\u0938","s");
        unicode.put("\u0939","h");

        // represent it\
        //  unicode.put("\u093c","'"); // avagraha using "'"
        //  unicode.put("\u093d","'"); // avagraha using "'"
        unicode.put("\u0969","3"); // 3 equals to pluta
        unicode.put("\u014F","Z");// Z equals to upadhamaniya
        unicode.put("\u0CF1","V");// V equals to jihvamuliya....but what character have u settled for jihvamuliya
     /*   unicode.put("\u0950","Ω"); // aum
        unicode.put("\u0958","κ"); // Urdu qaif
        unicode.put("\u0959","Κ"); //Urdu qhe
        unicode.put("\u095A","γ"); // Urdu gain
        unicode.put("\u095B","ζ"); //Urdu zal, ze, zoe
        unicode.put("\u095E","φ"); // Urdu f
        unicode.put("\u095C","δ"); // Hindi 'dh' as in padh
        unicode.put("\u095D","Δ"); // hindi dhh*/
        unicode.put("\u0926\u093C","τ"); // Urdu dwad
        unicode.put("\u0924\u093C","θ"); // Urdu toe
        unicode.put("\u0938\u093C","σ"); // Urdu swad, se
    }

    ConversionTable()
    {
        populateHashTable();
    }

    public String transform(String s1)
    {

        StringBuilder transformed = new StringBuilder();

        int strLen = s1.length();
        ArrayList<String> shabda = new ArrayList<>();
        String lastEntry = "";

        for (int i = 0; i < strLen; i++)
        {
            char c = s1.charAt(i);
            String varna = String.valueOf(c);

            Log.d(TAG, "transform: " + varna + "\n");

            String halant = "0x0951";

            if (VowelUtil.isConsonant(varna))
            {
                Log.d(TAG, "transform: " + unicode.get(varna));
                shabda.add(unicode.get(varna));
                shabda.add(halant); //halant
                lastEntry = halant;
            }

            else if (VowelUtil.isVowel(varna))
            {
                Log.d(TAG, "transform: " + "Vowel Detected...");
                if (halant.equals(lastEntry))
                {
                    if (varna.equals("a"))
                    {
                        shabda.set(shabda.size() - 1,"");
                    }
                    else
                    {
                        shabda.set(shabda.size() - 1, unicode.get(varna));
                    }
                }

                else
                {
                    shabda.add(unicode.get(varna));
                }
                lastEntry = unicode.get(varna);
            } // end of else if is-Vowel

            else if (unicode.containsKey(varna))
            {
                shabda.add(unicode.get(varna));
                lastEntry = unicode.get(varna);
            }
            else
            {
                shabda.add(varna);
                lastEntry = varna;
            }

        } // end of for

        for (String string: shabda)
        {
            transformed.append(string);
        }

        //Discard the shabda array
        shabda = null;
        return transformed.toString(); // return transformed;
    }

}

My VowelUtil.class : This will check for vowels and consonants, although it's not required to check for that.

package android.example.com.conversion;

public class VowelUtil {

    protected static boolean isVowel(String strVowel) {
        // Log.logInfo("came in is_Vowel: Checking whether string is a Vowel");
        return strVowel.equals("a") || strVowel.equals("aa") || strVowel.equals("i") || strVowel.equals("ee") ||
                strVowel.equals("u") || strVowel.equals("oo") || strVowel.equals("ri") || strVowel.equals("lri") || strVowel.equals("e")
                || strVowel.equals("ai") || strVowel.equals("o") || strVowel.equals("au") || strVowel.equals("om");
    }

    protected static boolean isConsonant(String strConsonant) {
        // Log.logInfo("came in is_consonant: Checking whether string is a
        // consonant");
        return strConsonant.equals("k") || strConsonant.equals("kh") || strConsonant.equals("g")
                || strConsonant.equals("gh") || strConsonant.equals("ng") || strConsonant.equals("ch") || strConsonant.equals("chh") || strConsonant.equals("j")
                || strConsonant.equals("jh") || strConsonant.equals("ny") || strConsonant.equals("t") || strConsonant.equals("th") ||
                strConsonant.equals("d") || strConsonant.equals("dh") || strConsonant.equals("n") || strConsonant.equals("nn") || strConsonant.equals("p") ||
                strConsonant.equals("ph") || strConsonant.equals("b") || strConsonant.equals("bh") || strConsonant.equals("m") || strConsonant.equals("y") ||
                strConsonant.equals("r") || strConsonant.equals("rr") || strConsonant.equals("l") || strConsonant.equals("ll") || strConsonant.equals("lll") ||
                strConsonant.equals("v") || strConsonant.equals("sh") || strConsonant.equals("ss") || strConsonant.equals("s") || strConsonant.equals("h") ||
                strConsonant.equals("3") || strConsonant.equals("z") || strConsonant.equals("v") || strConsonant.equals("Ω") ||
                strConsonant.equals("κ") || strConsonant.equals("K") || strConsonant.equals("γ") || strConsonant.equals("ζ") || strConsonant.equals("φ") ||
                strConsonant.equals("δ") || strConsonant.equals("Δ") || strConsonant.equals("τ") || strConsonant.equals("θ") || strConsonant.equals("σ");
    }
}

This is the simplest method by which you can do transliteration and this is general method, which you can apply to do any language translateration to any other language transliteration.You just need to supply perfect unicodes to map to other language's unicodes.

Results :

结果截图...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM