简体   繁体   中英

Extract acronyms patterns from string using regex

I have this problem:

list_= ["blabla S.P.A words J.R words. , words","words words !! words s.r.l words. D.T. words","words words I.B.M. words words."]

I would like to have:

['S.P.A', 'J.R']
['s.r.l', 'D.T.']
['I.B.M.']

I found this amazing solution Finding Acronyms Using Regex In Python that returns:

['S.P.', 'J.']
['s.r.', 'D.T.']
['I.B.M.']

How can I use that solution in my situation?

Thank you

You just need to make the final period optional. Also lookbehind for a space or the start of the string before the first letter to ensure it's not part of another word, and lookahead after the end for a space or the end of the string:

pattern = r'(?i)(?:^|(?<= ))(?:[a-z]\.)+[a-z]\.?(?= |$)'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM