[英]php regular expression extract part of text
I need to extract from a DB,in which records in one column are combined in this way: first letter(Firstname1).我需要从数据库中提取,其中一列中的记录以这种方式组合:第一个字母(名字 1)。 Lastname1,first letter(Firstname2).
姓 1,第一个字母(名字 2)。 Lastname2,....
姓 2,....
here is an example of how I tried to resolve...这是我如何尝试解决的一个例子......
$text2= "T. Toth, M. A. Carlo de Miller, T. Stallone";
$keywords = preg_split("/,/", "$text2");
print_r($keywords);
//I got a result in this way:
//Array ( [0] => T. Toth [1] => M. A. Carlo de Miller [2] => T. Stallone )
// I want a result of the form :
//Array ( [0] => T [1] => Toth [2] => M. A. [3] => Carlo de Miller [4] => T and so on....
Someone can get an idea of how to proceed?even if it can be in MYSQL有人可以了解如何进行吗? 即使它可以在 MYSQL 中
One more variant:另一种变体:
$text2= "T. Toth, M. A. Carlo de Miller, T. Stallone";
$result = array();
foreach (explode(",",$text2) as $row)
{
$row = explode(".",$row);
$last = array_pop($row);
$result[] = join(".",$row).".";
$result[] = $last;
}
print_r($result);
Result:结果:
Array ( [0] => T. [1] => Toth [2] => M. A. [3] => Carlo de Miller [4] => T. [5] => Stallone )
preg_split
may not be the right function for this. preg_split
可能不是正确的功能。 Try this with preg_match_all
:用
preg_match_all
试试这个:
$text2= "T. Toth, M. A. Carlo de Miller, T. Stallone";
preg_match_all("/\w{2,}(?:\s\w{2,})*|\w\.(?:\s\w\.)*/i", $text2, $matches);
print_r($matches[0]);
This picks out names and initials, while leaving out leading/trailing white-spaces.这会选择名称和首字母缩写,同时省略前导/尾随空格。
First match whole name: \\w{2,}(?:\\s\\w{2,})*
第一个匹配全名:
\\w{2,}(?:\\s\\w{2,})*
Second match initials: \\w\\.(?:\\s\\w\\.)*
第二场比赛首字母:
\\w\\.(?:\\s\\w\\.)*
Results in:结果是:
Array ( [0] => Array ( [0] => T. [1] => Toth [2] => M. A. [3] => Carlo de Miller [4] => T. [5] => Stallone ) )
I think this regular expression should more or less do what you want:我认为这个正则表达式应该或多或少地做你想要的:
/
(?:^|,) # Start of subject or comma
\s* # Optional white space
((?:[a-z]\.\s*)+) # At least one occurrence of alpha followed by dot
\s* # Consume trailing whitespace
/ix
When used in combination with the PREG_SPLIT_NO_EMPTY
and PREG_SPLIT_DELIM_CAPTURE
capture flags, this expression will obtain the result you want, the only caveat is that it will also capture some leading/trailing whitespace.当与
PREG_SPLIT_NO_EMPTY
和PREG_SPLIT_DELIM_CAPTURE
捕获标志结合使用时,此表达式将获得您想要的结果,唯一需要注意的是它还会捕获一些前导/尾随空格。 I can't see a way to avoid this, and it can be easily trimmed off when you use the result.我看不出有什么方法可以避免这种情况,并且在使用结果时可以轻松修剪掉它。
$str = 'T. Toth, M. A. Carlo de Miller, T. Stallone';
$expr = '/(?:^|,)\s*((?:[a-z]\.\s*)+)\s*/i';
$flags = PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE;
$keywords = preg_split($expr, $str, -1, $flags);
print_r($keywords);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.