简体   繁体   中英

Combined regex pattern to match beginning and end of string and remove a separator character

I have the following strings:

"LP, bar, company LLP, foo, LLP"
"LLP, bar, company LLP, foo, LP"
"LLP,bar, company LLP, foo,LP"  # note the absence of a space after/before comma to be removed

I am looking for a regex that takes those inputs and returns the following:

"LP bar, company LLP, foo LLP"
"LLP bar, company LLP, foo LP"
"LLP bar, company LLP, foo LP"

What I have so fat is this:

import re

def fix_broken_entity_names(name):
    """
    LLP, NAME -> LLP NAME
    NAME, LP -> NAME LP
    """
    pattern_end = r'^(LL?P),'
    pattern_beg_1 = r', (LL?P)$'
    pattern_beg_2 = r',(LL?P)$'
    combined = r'|'.join((pattern_beg_1, pattern_beg_2, pattern_end))
    return re.sub(combined, r' \1', name)

When I run it tho:

>>> fix_broken_entity_names("LP, bar, company LLP, foo,LP")
Out[1]: '  bar, company LLP, foo '

I'd be very thankful for any tips or solutions:)

You can use

import re
texts = ["LP, bar, company LLP, foo, LLP","LLP, bar, company LLP, foo, LP","LLP,bar, company LLP, foo,LP"]
for text in texts:
    result = ' '.join(re.sub(r"^(LL?P)\s*,|,\s*(LL?P)$", r" \1\2 ", text).split())
    print("'{}' -> '{}'".format(text, result))

Output:

'LP, bar, company LLP, foo, LLP' -> 'LP bar, company LLP, foo LLP'
'LLP, bar, company LLP, foo, LP' -> 'LLP bar, company LLP, foo LP'
'LLP,bar, company LLP, foo,LP' -> 'LLP bar, company LLP, foo LP'

See a Python demo . The regex is ^(LL?P)\s*,|,\s*(LL?P)$ :

  • ^(LL?P)\s*, - start of string, LLP or LP (Group 1), zero or more whitespaces, comma
  • | - or
  • ,\s*(LL?P)$ - a comma, zero or more whitespaces, LP or LLP (Group 2) and then of string.

Note the replacement is a concatenation of Group 1 and 2 values enclosed within single spaces, and a post-process step is to remove all leading/trailing whitespace and shrink whitespace inside the string to single spaces.

Make use of capture groups and reformat it how you wish:

regex:

([^,\r\n]+) *, *([^,\r\n]+) *, *([^,\r\n]+) *, *([^,\r\n]+) *, *([^,\r\n]+)

replacement

\1 \2, \3, \4 \5

https://regex101.com/r/jcEzzy/1/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM