简体   繁体   中英

php regular expression extract part of text

I need to extract from a DB,in which records in one column are combined in this way: first letter(Firstname1). Lastname1,first letter(Firstname2). Lastname2,....

here is an example of how I tried to resolve...

     $text2= "T. Toth, M. A. Carlo de Miller, T. Stallone";
     $keywords = preg_split("/,/", "$text2");

     print_r($keywords);

    //I got a result in this way:

    //Array ( [0] => T. Toth [1] => M. A. Carlo de Miller [2] => T. Stallone ) 

    // I want a result of the form :

    //Array ( [0] => T [1] => Toth [2] => M. A. [3] => Carlo de Miller [4] => T  and    so on....

Someone can get an idea of how to proceed?even if it can be in MYSQL

One more variant:

$text2= "T. Toth, M. A. Carlo de Miller, T. Stallone";
$result = array();
foreach (explode(",",$text2) as $row)
{
  $row = explode(".",$row);
  $last = array_pop($row);
  $result[] = join(".",$row).".";
  $result[] = $last;
}
print_r($result);

Result:

Array ( [0] => T. [1] => Toth [2] => M. A. [3] => Carlo de Miller [4] => T. [5] => Stallone )

preg_split may not be the right function for this. Try this with preg_match_all :

$text2= "T. Toth, M. A. Carlo de Miller, T. Stallone";
preg_match_all("/\w{2,}(?:\s\w{2,})*|\w\.(?:\s\w\.)*/i", $text2, $matches);
print_r($matches[0]);

This picks out names and initials, while leaving out leading/trailing white-spaces.

First match whole name: \\w{2,}(?:\\s\\w{2,})*

Second match initials: \\w\\.(?:\\s\\w\\.)*

Results in:

Array ( [0] => Array ( [0] => T. [1] => Toth [2] => M. A. [3] => Carlo de Miller [4] => T. [5] => Stallone ) )

I think this regular expression should more or less do what you want:

/
  (?:^|,)           # Start of subject or comma
  \s*               # Optional white space
  ((?:[a-z]\.\s*)+) # At least one occurrence of alpha followed by dot
  \s*               # Consume trailing whitespace
/ix

When used in combination with the PREG_SPLIT_NO_EMPTY and PREG_SPLIT_DELIM_CAPTURE capture flags, this expression will obtain the result you want, the only caveat is that it will also capture some leading/trailing whitespace. I can't see a way to avoid this, and it can be easily trimmed off when you use the result.

$str = 'T. Toth, M. A. Carlo de Miller, T. Stallone';
$expr = '/(?:^|,)\s*((?:[a-z]\.\s*)+)\s*/i';
$flags = PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE;

$keywords = preg_split($expr, $str, -1, $flags);

print_r($keywords);

See it working

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM