简体   繁体   中英

how to deal with compound words in regex

I am making regexes that return the definitions of abbreviations from a text. I have solved for a number of cases but i cannot make a solution for the case that the abbreviation has different number of characters than its actual words maybe because one word is compound like below.

string = 'CRC comes from the words colorectal cancer'

I would like to get the 'colorectal cancer' based on its short-form. Do you have any advice on what steps I should take? I thought of splitting compounds words, but it will lead to other problems.

In CRC the first word should begin with C. and the next word could be either R or C, if second word is R, third word should be C or there is not a third word at all. at the same time you should check second word starts with C. If so you dont need to check for third word. OR condition in regex maybe upto help. I cannot pinpoint how, if I dont have enough data samples

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM