简体   繁体   中英

Regex: match last string up until occurrence of another string (if it occurs)

I am trying to parse through a list of names to retrieve the surname though some of my strings have a certain suffix which I would like to ignore (A\.?C\.?)

Have:

MR JOHN SMITH
MR JOHN TERRENCE A.C.
MR JOHN DOE AC
MR JOHN CLARK A.C
MR JOHN BOND AC.

Want:

SMITH
TERRENCE
DOE
CLARK
BOND

I think this can be achieved with a capture group and a negative look-ahead but unsure how to proceed. I have got so far with:

(\bA\.?C\.?$)?(?(1)|\S*$)

This matches SMITH in line 1 but unsure what to put after ?(1) and before |to match TERRENCE, DOE, CLARK, or BOND in lines 2 to 5, respectively, or even if this is the right approach.

Maybe this could be helpful:

([A-Z]+)(?:\s+A\.?C\.?)?$

And grab capture group 1. See the demo

You say the characters can be letters and digits.

Use

\b([A-Za-z0-9]+)(?:\s+A\.?C\.?)?$

See proof .

EXPLANATION

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [A-Za-z0-9]+             any character of: 'A' to 'Z', 'a' to
                             'z', '0' to '9' (1 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    A                        'A'
--------------------------------------------------------------------------------
    \.?                      '.' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    C                        'C'
--------------------------------------------------------------------------------
    \.?                      '.' (optional (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )?                       end of grouping
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM