PHP中的Regex：将字符串中第一个单词之后的所有单词截断，并将所有单词截断为第一个字符

Question

I'm quite terrible at regexes. 我对正则表达式非常恐惧。

I have a string that may have 1 or more words in it (generally 2 or 3), usually a person name, for example: 我有一个字符串，其中可能包含1个或多个单词（通常为2个或3个），通常是人名，例如：

$str1 = 'John Smith';
$str2 = 'John Doe';
$str3 = 'David X. Cohen';
$str4 = 'Kim Jong Un';
$str5 = 'Bob';

I'd like to convert each as follows: 我想将每个转换如下：

$str1 = 'John S.';
$str2 = 'John D.';
$str3 = 'David X. C.';
$str4 = 'Kim J. U.';
$str5 = 'Bob';

My guess is that I should first match the first word, like so: 我的猜测是我应该首先匹配第一个单词，如下所示：

preg_match( "^([\w\-]+)", $str1, $first_word )

then all the words after the first one... but how do I match those? 然后是第一个单词之后的所有单词...但是我该如何匹配呢？ should I use again preg_match and use offset = 1 in the arguments? 我应该再次使用preg_match并在参数中使用offset = 1吗？ but that offset is in characters or bytes right? 但是偏移量是字符还是字节，对不对？

Anyway after I matched the words following the first, if the exist, should I do for each of them something like: 无论如何，在我匹配第一个之后的单词（如果存在）之后，我应该为它们中的每一个做以下事情：

$second_word = substr( $following_word, 1 ) . '. ';

Or my approach is completely wrong? 还是我的方法完全错误？

Thanks 谢谢

ps - it would be a boon if the regex could maintain the whole first two words when the string contain three or more words... (eg 'Kim Jong U.'). ps-如果字符串包含三个或更多单词（例如'Kim Jong U.'）时，正则表达式可以保留整个前两个单词将是一个福音。

Answer 1

It can be done in single preg_replace using a regex. 可以使用正则表达式在单个preg_replace完成。

You can search using this regex: 您可以使用此正则表达式进行搜索：

^\w+(?:$| +)(*SKIP)(*F)|(\w)\w+

And replace by: 并替换为：

$1.

RegEx Demo 正则演示

Code: 码：

$name = preg_replace('/^\w+(?:$| +)(*SKIP)(*F)|(\w)\w+/', '$1.', $name);

Explanation: 说明：

(*FAIL) behaves like a failing negative assertion and is a synonym for (?!) (*FAIL)行为类似于失败的否定断言，并且是(?!)的同义词
(*SKIP) defines a point beyond which the regex engine is not allowed to backtrack when the subpattern fails later (*SKIP)定义了一个点，当子模式稍后发生故障时，正则表达式引擎不允许回溯
(*SKIP)(*FAIL) together provide a nice alternative of restriction that you cannot have a variable length lookbehind in above regex. (*SKIP)(*FAIL)一起提供了一个很好的限制选择，即您不能在上面的正则表达式中留有可变长度。
^\\w+(?:$| +)(*SKIP)(*F) matches first word in a name and skips it (does nothing) ^\\w+(?:$| +)(*SKIP)(*F)匹配名称中的第一个单词并跳过它（不执行任何操作）
(\\w)\\w+ matches all other words and replaces it with first letter and a dot. (\\w)\\w+与所有其他单词匹配，并将其替换为第一个字母和一个点。

Answer 2

You could use a positive lookbehind assertion. 您可以在断言之后使用肯定的回溯。

(?<=\h)([A-Z])\w+

OR 要么

Use this regex if you want to turn Bob F to Bob F. 如果要将Bob F转到Bob F ，请使用此正则表达式Bob F.

(?<=\h)([A-Z])\w*(?!\.)

Then replace the matched characters with \\1. 然后将匹配的字符替换为\\1.

DEMO 演示

Code would be like, 代码就像

preg_replace('~(?<=\h)([A-Z])\w+~', '\1.', $string);

DEMO 演示

(?<=\\h)([AZ]) Captures all the uppercase letters which are preceeded by a horizontal space character. (?<=\\h)([AZ])捕获由水平空格字符开头的所有大写字母。
\\w+ matches one or more word characters. \\w+匹配一个或多个单词字符。
Replace the matched chars with the chars inside the group index 1 \\1 plus a dot will give you the desired output. 将匹配的字符替换为组索引1 \\1的字符，再加上一个点将为您提供所需的输出。

Answer 3

A simple solution with only look-ahead and word boundary check: 仅需提前检查和单词边界检查的简单解决方案：

preg_replace('~(?!^)\b(\w)\w+~', '$1.', $string);

(\\w)\\w+ is a word in the name, with the first character captured (\\w)\\w+是名称中的一个单词，第一个字符被捕获
(?!^)\\b performs a word boundary check \\b , and makes sure the match is not at the start of the string (?!^) . (?!^)\\b执行单词边界检查\\b ，并确保匹配项不在字符串(?!^)的开头。

Demo 演示版

PHP中的Regex：将字符串中第一个单词之后的所有单词截断，并将所有单词截断为第一个字符

问题描述

3 个解决方案

解决方案1
4 已采纳 2015-01-10 09:55:18

RegEx Demo 正则演示

解决方案2
1 2015-01-10 10:00:04

解决方案3
0 2015-01-10 12:15:33

PHP中的Regex：将字符串中第一个单词之后的所有单词截断，并将所有单词截断为第一个字符

问题描述

3 个解决方案

解决方案1 4 已采纳 2015-01-10 09:55:18

RegEx Demo 正则演示

解决方案2 1 2015-01-10 10:00:04

解决方案3 0 2015-01-10 12:15:33

解决方案1
4 已采纳 2015-01-10 09:55:18

解决方案2
1 2015-01-10 10:00:04

解决方案3
0 2015-01-10 12:15:33