[英]Regex in PHP: take all the words after the first one in string and truncate all of them to the first character
I'm quite terrible at regexes. 我对正则表达式非常恐惧。
I have a string that may have 1 or more words in it (generally 2 or 3), usually a person name, for example: 我有一个字符串,其中可能包含1个或多个单词(通常为2个或3个),通常是人名,例如:
$str1 = 'John Smith';
$str2 = 'John Doe';
$str3 = 'David X. Cohen';
$str4 = 'Kim Jong Un';
$str5 = 'Bob';
I'd like to convert each as follows: 我想将每个转换如下:
$str1 = 'John S.';
$str2 = 'John D.';
$str3 = 'David X. C.';
$str4 = 'Kim J. U.';
$str5 = 'Bob';
My guess is that I should first match the first word, like so: 我的猜测是我应该首先匹配第一个单词,如下所示:
preg_match( "^([\w\-]+)", $str1, $first_word )
then all the words after the first one... but how do I match those? 然后是第一个单词之后的所有单词...但是我该如何匹配呢? should I use again preg_match and use offset = 1 in the arguments? 我应该再次使用preg_match并在参数中使用offset = 1吗? but that offset is in characters or bytes right? 但是偏移量是字符还是字节,对不对?
Anyway after I matched the words following the first, if the exist, should I do for each of them something like: 无论如何,在我匹配第一个之后的单词(如果存在)之后,我应该为它们中的每一个做以下事情:
$second_word = substr( $following_word, 1 ) . '. ';
Or my approach is completely wrong? 还是我的方法完全错误?
Thanks 谢谢
ps - it would be a boon if the regex could maintain the whole first two words when the string contain three or more words... (eg 'Kim Jong U.'). ps-如果字符串包含三个或更多单词(例如'Kim Jong U.')时,正则表达式可以保留整个前两个单词将是一个福音。
It can be done in single preg_replace
using a regex. 可以使用正则表达式在单个preg_replace
完成。
You can search using this regex: 您可以使用此正则表达式进行搜索:
^\w+(?:$| +)(*SKIP)(*F)|(\w)\w+
And replace by: 并替换为:
$1.
Code: 码:
$name = preg_replace('/^\w+(?:$| +)(*SKIP)(*F)|(\w)\w+/', '$1.', $name);
Explanation: 说明:
(*FAIL)
behaves like a failing negative assertion and is a synonym for (?!)
(*FAIL)
行为类似于失败的否定断言,并且是(?!)
的同义词 (*SKIP)
defines a point beyond which the regex engine is not allowed to backtrack when the subpattern fails later (*SKIP)
定义了一个点,当子模式稍后发生故障时,正则表达式引擎不允许回溯 (*SKIP)(*FAIL)
together provide a nice alternative of restriction that you cannot have a variable length lookbehind in above regex. (*SKIP)(*FAIL)
一起提供了一个很好的限制选择,即您不能在上面的正则表达式中留有可变长度。 ^\\w+(?:$| +)(*SKIP)(*F)
matches first word in a name and skips it (does nothing) ^\\w+(?:$| +)(*SKIP)(*F)
匹配名称中的第一个单词并跳过它(不执行任何操作) (\\w)\\w+
matches all other words and replaces it with first letter and a dot. (\\w)\\w+
与所有其他单词匹配,并将其替换为第一个字母和一个点。 You could use a positive lookbehind assertion. 您可以在断言之后使用肯定的回溯。
(?<=\h)([A-Z])\w+
OR 要么
Use this regex if you want to turn Bob F
to Bob F.
如果要将Bob F
转到Bob F
,请使用此正则表达式Bob F.
(?<=\h)([A-Z])\w*(?!\.)
Then replace the matched characters with \\1.
然后将匹配的字符替换为\\1.
Code would be like, 代码就像
preg_replace('~(?<=\h)([A-Z])\w+~', '\1.', $string);
(?<=\\h)([AZ])
Captures all the uppercase letters which are preceeded by a horizontal space character. (?<=\\h)([AZ])
捕获由水平空格字符开头的所有大写字母。
\\w+
matches one or more word characters. \\w+
匹配一个或多个单词字符。
Replace the matched chars with the chars inside the group index 1 \\1
plus a dot will give you the desired output. 将匹配的字符替换为组索引1 \\1
的字符,再加上一个点将为您提供所需的输出。
A simple solution with only look-ahead and word boundary check: 仅需提前检查和单词边界检查的简单解决方案:
preg_replace('~(?!^)\b(\w)\w+~', '$1.', $string);
(\\w)\\w+
is a word in the name, with the first character captured (\\w)\\w+
是名称中的一个单词,第一个字符被捕获 (?!^)\\b
performs a word boundary check \\b
, and makes sure the match is not at the start of the string (?!^)
. (?!^)\\b
执行单词边界检查\\b
,并确保匹配项不在字符串(?!^)
的开头。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.