[英]ICU: Transliterate and then remove all non-alphanumeric characters
Can it be done with ICU without falling back to regex? 可以用ICU完成而不回退到正则表达式吗?
Currently I normalize filenames like this: 目前我规范化文件名,如下所示:
protected function normalizeFilename($filename)
{
$transliterator = Transliterator::createFromRules(
'Any-Latin; Latin-ASCII; [:Punctuation:] Remove;'
);
$filename = $transliterator->transliterate($filename);
$filename = preg_replace('/[^A-Za-z0-9_]/', '', $filename);
return $filename;
}
Can I get rid of regular expression here and do everything with ICU calls? 我可以在这里摆脱正则表达式并使用ICU呼叫做所有事情吗?
I don't see anything wrong with what you're doing now. 我现在所做的事情没有任何问题。
ICU transliteration is first and foremost language oriented. ICU音译首先是语言导向。 It tries to preserve meaning.
它试图保持意义。
Regular expressions, on the other hand, can manipulate characters in detail, giving you the assurance that the file name is restricted to the selected characters. 另一方面,正则表达式可以详细操作字符,从而确保文件名仅限于所选字符。
The combination is perfect, in this case. 在这种情况下,这种组合是完美的。
I have, of course, looked for a solution to your question. 当然,我已经找到了解决问题的方法。 But to be honest, I couldn't find something that would work on all possible inputs.
但说实话,我找不到适用于所有可能输入的东西。
For instance, not all characters, we would consider punctuation marks, are removed by [:Punctuation:] Remove;
例如,并非所有字符,我们都会考虑标点符号,通过
[:Punctuation:] Remove;
. 。 Try the Russian name:
Корнильев, Кирилл
. 请尝试俄语名称:
Корнильев, Кирилл
。 After applying your id
it becomes: Kornilʹev Kirill
. 申请你的
id
它变成了: Kornilʹev Kirill
。 Clearly that's not a punctuation mark, but you don't want it in your file name. 显然,这不是标点符号,但您不希望它在您的文件名中。
So I would advice to use the correct tool for the job: 所以我建议使用正确的工具:
Latin-ASCII;
Latin-ASCII;
as the id
will do. id
会做的那样。 Nice and simple. There is really nothing wrong with this. 这真的没有错。
PS: Personally I think the person, or persons, who wrote the ICU user guide should not be complimented on a job well done. PS:就我个人而言,我认为编写ICU用户指南的人或人员不应该对做得好的工作表示赞赏。 What a mess.
真是一团糟。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.