简体   繁体   English

PHP中的Regex:将字符串中第一个单词之后的所有单词截断,并将所有单词截断为第一个字符

[英]Regex in PHP: take all the words after the first one in string and truncate all of them to the first character

I'm quite terrible at regexes. 我对正则表达式非常恐惧。

I have a string that may have 1 or more words in it (generally 2 or 3), usually a person name, for example: 我有一个字符串,其中可能包含1个或多个单词(通常为2个或3个),通常是人名,例如:

$str1 = 'John Smith';
$str2 = 'John Doe';
$str3 = 'David X. Cohen';
$str4 = 'Kim Jong Un';
$str5 = 'Bob';

I'd like to convert each as follows: 我想将每个转换如下:

$str1 = 'John S.';
$str2 = 'John D.';
$str3 = 'David X. C.';
$str4 = 'Kim J. U.';
$str5 = 'Bob';

My guess is that I should first match the first word, like so: 我的猜测是我应该首先匹配第一个单词,如下所示:

preg_match( "^([\w\-]+)", $str1, $first_word )

then all the words after the first one... but how do I match those? 然后是第一个单词之后的所有单词...但是我该如何匹配呢? should I use again preg_match and use offset = 1 in the arguments? 我应该再次使用preg_match并在参数中使用offset = 1吗? but that offset is in characters or bytes right? 但是偏移量是字符还是字节,对不对?

Anyway after I matched the words following the first, if the exist, should I do for each of them something like: 无论如何,在我匹配第一个之后的单词(如果存在)之后,我应该为它们中的每一个做以下事情:

$second_word = substr( $following_word, 1 ) . '. ';

Or my approach is completely wrong? 还是我的方法完全错误?

Thanks 谢谢

ps - it would be a boon if the regex could maintain the whole first two words when the string contain three or more words... (eg 'Kim Jong U.'). ps-如果字符串包含三个或更多单词(例如'Kim Jong U.')时,正则表达式可以保留整个前两个单词将是一个福音。

It can be done in single preg_replace using a regex. 可以使用正则表达式在单个preg_replace完成。

You can search using this regex: 您可以使用此正则表达式进行搜索:

^\w+(?:$| +)(*SKIP)(*F)|(\w)\w+

And replace by: 并替换为:

$1.

RegEx Demo 正则演示

Code: 码:

$name = preg_replace('/^\w+(?:$| +)(*SKIP)(*F)|(\w)\w+/', '$1.', $name);

Explanation: 说明:

  • (*FAIL) behaves like a failing negative assertion and is a synonym for (?!) (*FAIL)行为类似于失败的否定断言,并且是(?!)的同义词
  • (*SKIP) defines a point beyond which the regex engine is not allowed to backtrack when the subpattern fails later (*SKIP)定义了一个点,当子模式稍后发生故障时,正则表达式引擎不允许回溯
  • (*SKIP)(*FAIL) together provide a nice alternative of restriction that you cannot have a variable length lookbehind in above regex. (*SKIP)(*FAIL)一起提供了一个很好的限制选择,即您不能在上面的正则表达式中留有可变长度。
  • ^\\w+(?:$| +)(*SKIP)(*F) matches first word in a name and skips it (does nothing) ^\\w+(?:$| +)(*SKIP)(*F)匹配名称中的第一个单词并跳过它(不执行任何操作)
  • (\\w)\\w+ matches all other words and replaces it with first letter and a dot. (\\w)\\w+与所有其他单词匹配,并将其替换为第一个字母和一个点。

You could use a positive lookbehind assertion. 您可以在断言之后使用肯定的回溯。

(?<=\h)([A-Z])\w+

OR 要么

Use this regex if you want to turn Bob F to Bob F. 如果要将Bob F转到Bob F ,请使用此正则表达式Bob F.

(?<=\h)([A-Z])\w*(?!\.)

Then replace the matched characters with \\1. 然后将匹配的字符替换为\\1.

DEMO 演示

Code would be like, 代码就像

preg_replace('~(?<=\h)([A-Z])\w+~', '\1.', $string);

DEMO 演示

  • (?<=\\h)([AZ]) Captures all the uppercase letters which are preceeded by a horizontal space character. (?<=\\h)([AZ])捕获由水平空格字符开头的所有大写字母。

  • \\w+ matches one or more word characters. \\w+匹配一个或多个单词字符。

  • Replace the matched chars with the chars inside the group index 1 \\1 plus a dot will give you the desired output. 将匹配的字符替换为组索引1 \\1的字符,再加上一个点将为您提供所需的输出。

A simple solution with only look-ahead and word boundary check: 仅需提前检查和单词边界检查的简单解决方案:

preg_replace('~(?!^)\b(\w)\w+~', '$1.', $string);
  • (\\w)\\w+ is a word in the name, with the first character captured (\\w)\\w+是名称中的一个单词,第一个字符被捕获
  • (?!^)\\b performs a word boundary check \\b , and makes sure the match is not at the start of the string (?!^) . (?!^)\\b执行单词边界检查\\b ,并确保匹配项不在字符串(?!^)的开头。

Demo 演示版

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 正则表达式提取字符串中的第一个数字系列和之后的所有单词 - regex to extract first series of numbers in a string and all words after PHP Regex在字符串中的第一个全大写字母词后插入字符 - Php Regex to insert character after first all-capital letter word in a string 正则表达式在 Dart 中的第一个位置之后的所有字符 - Regex all character after first position in Dart 使用preg_replace()替换字符串中单词的所有第一个字符 - Replace all the first character of words in a string using preg_replace() python 正则表达式查找列表中字符串的所有第一个起始词 - python regex finding all the first starting words of a string that are in a list RegEx在第一次出现后匹配所有出现的字符*? - RegEx to match all occurrences of a character *after* the first occurrence? 获取第一个数字字符 python 正则表达式之前的所有字符串 - get all the string before first numeric character python regex 正则表达式提取所有后续单词的第一个单词+第一个字符 - Regular expression to extract first word + first character of all following words 正则表达式删除字符串中三个或更少字符单词的第一个字符 - Regex remove first character of three or lesser character words in a string php的preg_match_all:按第一个字符分割字符串 - php's preg_match_all: split string by first character
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM