How to transliterate non-latin scripts?

Question

I'm playing around with transliteration in PHP using iconv . Particularly I want to normalise accented characters and Romanize other scripts from UTF-8 to plain ASCII.

While many characters work, (such as Ž -> Z ) others are giving odd results or raising errors.

For example, E ACUTE é (U+00E9) transliterates to ASCII with a single quote (U+0027) preceding the e as if it's trying to represent the diacritic mark I'm trying to get rid of.

$utf_8 = "\xC3\xA9"; // <- é
$ascii = iconv( 'UTF-8', 'ASCII//TRANSLIT', $utf_8 );
// returns "'e", not "e"

Non-latin scripts are worse, for example Greek sigma Σ (U+03A3) which should transliterate to latin S is not recognised at all and raises an error:

$utf_8 = "\xCE\xA3"; // <- Σ
$ascii = iconv( 'UTF-8', 'ASCII//TRANSLIT', $utf_8 );
// Raises notice: iconv(): Detected an illegal character in input string

I can just about cope with the first one, but how can I transliterate "Σ" to "S", and do this reliably across other scripts that have equivalent characters?

I don't mind generating my own tables if there is a good source that works for most european languages.

Note that I've tried various collation tables , which are useful for normalising accented latin characters, but they don't work for transliterating between scripts.

Answer 1

I've not had much luck using iconv . It always manages to throw a bunch of notices.

The best luck I've had is with using a custom transliteration table. It's far from perfect but at least you'll feel like you have some solid ground.

I've not found a good single source for transliteration tables. My unfamiliarity with anything but the latin script isn't helping.

Answer 2

我已经尝试过类似的方法 -它主要基于Doctrine 1代码，但并不完美：但是它似乎可以与我所提交的所有测试数据一起使用。

How to transliterate non-latin scripts?

Question

2 answers

solution1
0 2013-07-25 16:44:35

solution2
0 2013-07-26 08:39:03

How to transliterate non-latin scripts?

Question

2 answers

solution1 0 2013-07-25 16:44:35

solution2 0 2013-07-26 08:39:03

solution1
0 2013-07-25 16:44:35

solution2
0 2013-07-26 08:39:03