简体   繁体   中英

Regular Expression to find "lastname, firstname middlename" format

I am trying to find the format "abc, def g" which is a name format "lastname, firstname middlename". I think the best suited method is regex but I do not have any idea in Regex. I tried doing some learning in regex and tried some expression also but no luck. One additional point there may be more than one spaces between the words.

This is what I tried. But this is not working.

(([A-Z][,]\s?)*([A-Z][a-z]+\s?)+([A-Z]\s?[a-z]*)*)

Need help ! Any idea how I can do this so that only the above expression match.

Thanks !

ANSWER

Finally I am using

([A-Za-z]+),\\s*([A-Za-z]+)\\s*([A-Za-z]+)

Thanks to everyone for the suggestions.

I would try and avoid a complicated regex, I would use String.substring() and indexOf() . That is, something like

String name = "Last, First Middle";
int comma = name.indexOf(',');
int lastSpace = name.lastIndexOf(' ');
String lastName = name.substring(0, comma);
String firstName = name.substring(comma + 2, lastSpace);
String middleName = name.substring(lastSpace + 1);
System.out.printf("first='%s' middle='%s' last='%s'%n", firstName,
            middleName, lastName);

Output is

first='First' middle='Middle' last='Last'

Your sample input is "lastname, firstname middlename" - with that, you can use the following regexp to extract lastname, firstname and middlename (with the addition that there might be multiple white spaces, and that there might be both capital and non-capital letters in the strings - also, all parts are mandatory):

String input = "Lastname,   firstname   middlename";
String regexp = "([A-Za-z]+),\\s+([A-Za-z]+)\\s+([A-Za-z]+)";

Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(input);
matcher.find();
System.out.println("Lastname  : " + matcher.group(1));
System.out.println("Firstname : " + matcher.group(2));
System.out.println("Middlename: " + matcher.group(3));

Short summary:

([A-Za-z]+)   First capture group - matches one or more letters to extract the last name
,\\s+         Capture group is followed by a comma and one or more spaces
([A-Za-z]+)   Second capture group - matches one or more letters to extract the first name
\\s+          Capture group is followed by one or more spaces
([A-Za-z]+)   Third capture group - matches one or more letters to extract the middle name

This only works if your names contain latin letters only - probably you should use a more open match for the characters:

String input = "Müller,   firstname  middlename";
String regexp = "(.+),\\s+(.+)\\s+(.+)";

This matches any character for lastname, firstname and middlename.

If the spaces are optional (only the first occurrence can be optional, otherwise we can not distinguish between firstname and middlename), then use * instead of + :

String input = "Müller,firstname  middlename";
String regexp = "(.+),\\s*(.+)\\s+(.+)";

As @Elliott mentions, there might be other possibilities like using String.split() or String.indexOf() with String.substring() - regular expressions are often more flexible, but harder to maintain, especially for complex expressions.

In either case, implement unit tests with as much different inputs (including invalid ones) as possible so that you can verify that your algorithm is still valid after you modify it.

As an alternative to matching the lastname, firstname middlename directly, you could use String.split and provide a regexp that matches the separators, instead. For instance:

static String[] lastFirstMiddle(String input){
    String[] result=input.split("[,\\s]+");
    System.out.println(Arrays.asList(result));
    return result;
}

I tested this with inputs

"Müller,   firstname  middlename"
"Müller,firstname  middlename"
 "O'Gara, Ronan Ramón"

Note: this approach fails with surnames that contain spaces, for instance "van der Heuvel", "de Valera", "mac Piarais" or "bin Laden" but then again, OP's original specification does not seem to admit of spaces in the surname (or the other names. I work with a "Mary Kate". That's her first name, not first and middle). There's an interesting page about personal names at http://www.w3.org/International/questions/qa-personal-names

^([a-zA-Z]+)\s*,\s*([a-zA-Z]+)\s+([a-zA-Z]+)$

I think you are looking for this.just grab the groups to get your needs.See demo.

http://regex101.com/r/hQ1rP0/6

I think this one will also work and a bit shorter than yours:

([A-Z][a-z]*)(?:,\s*)?

Demo

Or you can use split using this regex:

(,?\s+)
import re
def rearrange_name(name):
  result = re.search(r"^([\w \.-]*), ([\w \.-]*)$", name)  #Included extra characters i.e. .- to be captured in our groups in the event they are found.
  #result = re.search(r"^([\w .-]*), ([\w .-]*)$", name)  #seems to be also working without escaping the period
  if result == None:
    return name
  return "{} {}".format(result[2], result[1])

name=rearrange_name("Raila, Odinga M.")
print(name)`enter code here`
import re

def rearrange_name(name):
    result = re.search(r"^([\w \.-]*), ([\w \.-]*)$", name)
    if result == None:
        return name
    return "{} {}".format(result[2], result[1])

name = rearrange_name("Erick, Bett K.")
print(name)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM