匹配名称首字母的正则表达式-PCRE

Question

I have a regular expression to get the initials of a name like below: 我有一个正则表达式来获取名称的缩写，如下所示：

/\\b\\p{L}\\./gu

it works fine with English and other languages until there are graphemes and combined charecters occur. 它可以与英语和其他语言一起正常工作，直到出现字素和组合字符为止。 Like 喜欢
क in Hindi and क在印地文和
ಕ in Kannada are being matched ಕ在埃纳德语被匹配
But, 但，
के this one in Hindi, 在印地语के这个
ಕೆ this one in Kannada are notmatched with this regex. ಕೆ卡纳达语中的这一行与此正则表达式不匹配。
I am trying to get the initials from a name like JPMorgan, etc. 我正在尝试从诸如JPMorgan等的名称获取缩写。
Any help would be greatly appreciated. 任何帮助将不胜感激。

Answer 1

You need to match diacritic marks after base letters using \\p{M}* : 您需要使用\\p{M}*在基字母后面匹配音符号：

'~\b(?<!\p{M})\p{L}\p{M}*\.~u'

The pattern matches 模式匹配

\\b - a word boundary \\b单词边界
(?<!\\p{M}) - the char before the current position must not be a diacritic char (without it, a match can occur within a single word) (?<!\\p{M}) -当前位置之前的字符不能为变音字符（如果没有该字符，则单个单词内可以出现匹配项）
\\p{L} - any base Unicode letter \\p{L} -任何基本Unicode字母
\\p{M}* - 0+ diacritic marks \\p{M}* -0 +变音符号
\\. - a dot. -一个点

See the PHP demo online : 在线观看PHP演示：

$s = "क. ಕ. के. ಕೆ. ";
echo preg_replace('~\b(?<!\p{M})\p{L}\p{M}*+\.~u', '<pre>$0</pre>', $s); 
// => <pre>क.</pre> <pre>ಕ.</pre> <pre>के.</pre> <pre>ಕೆ.</pre>

匹配名称首字母的正则表达式-PCRE

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-01-14 09:39:00

匹配名称首字母的正则表达式-PCRE

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-01-14 09:39:00

解决方案1
2 已采纳 2019-01-14 09:39:00