简体   繁体   English

PHP字符串函数与非英语语言

[英]PHP String Function with non-English languages

I was trying range(); 我在尝试range(); function with non-English language. 功能与非英语语言。 It is not working. 它不起作用。

$i =0
foreach(range('क', 'म') as $ab) {

    ++$i;

    $alphabets[$ab] = $i;

}

Output : à =1 输出 :à= 1

It was Hindi (India) alphabets. 这是印地语(印度)字母。 It is only iterating only once (Output shows). 它只迭代一次(输出显示)。

For this, I am not getting what to do! 为此,我不知道该怎么做!

So, if possible, please tell me what to do for this and what should I do first before thinking of working with non-English text with any PHP functions. 所以,如果可能的话,请告诉我该怎么做以及在考虑使用任何PHP函数的非英语文本之前我应该​​先做些什么。

Short answer: it's not possible to use range like that. 简短的回答:不可能像这样使用range

Explanation 说明

You are passing the string 'क' as the start of the range and 'म' as the end. 您将字符串'क'作为范围的开头并将'म'作为结尾。 You are getting only one character back, and that character is à . 你只得到一个角色,那个角色是à

You are getting back à because your source file is encoded (saved) in UTF-8. 您将返回à因为您的源文件是以UTF-8编码(保存)的。 One can tell this by the fact that à is code point U+00E0 , while 0xE0 is also the first byte of the UTF-8 encoded form of 'क' (which is 0xE0 0xA4 0x95 ). 可以通过以下事实来判断: à是代码点U+00E0 ,而0xE0也是UTF-8编码形式的'क'(即0xE0 0xA4 0x95 )的第一个字节。 Sadly, PHP has no notion of encodings so it just takes the first byte it sees in the string and uses that as the "start" character. 可悲的是,PHP没有编码的概念所以它只需要它在字符串中看到的第一个字节并将其用作“开始”字符。

You are getting back only à because the UTF-8 encoded form of 'म' also starts with 0xE0 (so PHP also thinks that the "end character" is 0xE0 or à ). 你是找回 à因为“म”的UTF-8编码的形式也开始0xE0 (所以PHP也认为,“结束字符”是0xE0à )。

Solution

You can write range as a for loop yourself, as long as there is some function that returns the Unicode code point of an UTF-8 character (and one that does the reverse). 您可以自己编写range作为for循环,只要有一些函数返回UTF-8字符的Unicode代码点(以及反向执行的代码点)。 So I googled and found these here : 所以我用Google搜索并在这里找到了这些:

// Returns the UTF-8 character with code point $intval
function unichr($intval) {
    return mb_convert_encoding(pack('n', $intval), 'UTF-8', 'UTF-16BE');
}

// Returns the code point for a UTF-8 character
function uniord($u) {
    $k = mb_convert_encoding($u, 'UCS-2LE', 'UTF-8');
    $k1 = ord(substr($k, 0, 1));
    $k2 = ord(substr($k, 1, 1));
    return $k2 * 256 + $k1;
}

With the above, you can now write: 有了上述内容,您现在可以写:

for($char = uniord('क'); $char <= uniord('म'); ++$char) {
    $alphabet[] = unichr($char);
}

print_r($alphabet);

See it in action . 看到它在行动

The lazy solution would be to use html_entity_decode() and range() only for the numeric ranges it was originally intended (that it works with ASCII is a bit silly anyway): 懒惰的解决方案是将html_entity_decode()range()仅用于最初预期的数值范围(无论如何,它与ASCII一起使用有点傻):

foreach (range(0x0915, 0x092E) as $char) {

    $char = html_entity_decode("&#$char;", ENT_COMPAT, "UTF-8");
    $alphabets[$char] = ++$i;
}

Another solution would be translating and getting the range then translate back again. 另一种解决方案是翻译并获得范围然后再翻译。

$first = file_get_contents("http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&langpair=|en&q=क");
$second = file_get_contents("http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&langpair=|en&q=म"); //not real value
$jsonfirst = json_decode($first);
$jsonsecond = json_decode($second);
$f = $jsonfirst->responseData->translatedText;
$l = $jsonsecond->responseData->translatedText;
foreach(range($f, $l) as $ab) {


    echo $ab; 

}

Outputs 输出

ABCDEFGHI

To translate back use an arraymap and a callback function that translates each of the English values back to hindi. 要转换回使用数组映射和回调函数,它将每个英语值转换回印地语。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM