简体   繁体   中英

Regular expression with connected words

I have been working on the python code to extract document Ids from text documents where IDs can be at the random line in the text using regex.

This document ID is comprised of four letters followed by a hyphen, followed by three numbers and optionally ending in a letter. For example, each of the following is valid document IDs:

  1. ABCD-123
  2. ABCD-123V
  3. XKCD-999
  4. COMP-200

I have tried following regular expression for finding all ids:

re = re.findall(r"([A-Z]{4})(-)([0-9]{3})([A-Z](?![A-Za-z]))?", text.read())

These expressions work correctly but I have a problem when Ids are connected to words eg XKCD-999James returns XKCD-999 which is correct but if the id is XKCD-999KEight it returns XKCD-999 while the correct answer is XKCD-999K


So basically I need an approach to separate any alpha characters connected to Words in a given id

What will be the correct approach for the following problem?

I have been working on the python code to extract document Ids from text documents where IDs can be at the random line in the text using regex.

This document ID is comprised of four letters followed by a hyphen, followed by three numbers and optionally ending in a letter. For example, each of the following is valid document IDs:

  1. ABCD-123
  2. ABCD-123V
  3. XKCD-999
  4. COMP-200

I have tried following regular expression for finding all ids:

re = re.findall(r"([A-Z]{4})(-)([0-9]{3})([A-Z](?![A-Za-z]))?", text.read())

These expressions work correctly but I have a problem when Ids are connected to words eg XKCD-999James returns XKCD-999 which is correct but if the id is XKCD-999KEight it returns XKCD-999 while the correct answer is XKCD-999K


So basically I need an approach to separate any alpha characters connected to Words in a given id

What will be the correct approach for the following problem?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM