简体   繁体   中英

regex split names with special characters (dash, apostrophe)

I have a column with names, and they are all concatenated (that is, there is no space between the first and last name). I am trying to split the first and last name, which has already been asked on this website. However here, some names have dashes \- or apostrophes \' .

Speed-WagonMario
CruiserPetey
SthesiaAnna
De’wayneJohn

I want to make sure it is catched by my regex query:

clean_names = re.split(r'([A-Z][a-z\']+\-[A-Z][a-z\']+|[A-Z][a-z\']+)', names)

It works for dashes, which happen only before an uppercase letter, but not for apostrophes.

Does anyone has an opinion on how to fix my query? Thanks in advance

You can combine a positive lookbehind (lower-case) with a positive lookahead (uppercase). Both of the matched lookarounds are kept when they are split.

/           // BEGIN EXPRESSION
(?<=[a-z])  // POSITIVE LOOKBEHIND [a-z]
(?=[A-Z])   // POSITIVE LOOKAHEAD  [A-Z]
/           // END EXPRESSION

Python Example

#!/usr/bin/env python3

import re

def pair_to_person(pair):
  person = {}
  person['firstName'] = pair[1]
  person['lastName'] = pair[0]
  return person

def parse_name_column(column_text):
  return map(pair_to_person,
    map(lambda name: re.split(r'(?<=[a-z])(?=[A-Z])', name),
      map(lambda x: x, column_text.strip().split('\n'))))

print_list = lambda list: print('\n'.join(map(str, list))) 

if __name__ == '__main__':
  column_text = '''
Speed-WagonMario
CruiserPetey
SthesiaAnna
De’wayneJohn
'''

  names = parse_name_column(column_text)

  print_list(names)

Output

{'firstName': 'Mario', 'lastName': 'Speed-Wagon'}
{'firstName': 'Petey', 'lastName': 'Cruiser'}
{'firstName': 'Anna', 'lastName': 'Sthesia'}
{'firstName': 'John', 'lastName': 'De’wayne'}

JS Example

 const data = ` Speed-WagonMario CruiserPetey SthesiaAnna De'wayneJohn `; const names = data.trim().split('\n').map(name => name.trim().split(/(?<=[az])(?=[AZ])/)).map(pair => ({ firstName: pair[1], lastName: pair[0] })); console.log(names);
 .as-console-wrapper { top: 0; max-height: 100%;important; }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM