简体   繁体   中英

RegEx to find exactly 4 digits number

I have a text similar to the text below. It contains a 4 digits number that follows either digit- or whitespace and is followed by either . , ? , -digit or whitespace .


I need to match all of the digits in the first paragraph but none in the second since those digits do not meet my conditions.

Lorem ipsum 3400-digit, sit amet 5000 consectetur adipisicing elit. Natus, explicabo 6700? Itaque iure ipsum laboriosam, ex nemo delectus iste quia cupiditate digit-9134? Iste nam digit-2456 at voluptate est 8456-digit? At excepturi quis voluptatibus 7500.

Lorem ipsum $5000 dolor sit amet consectetur adipisicing elit. Obcaecati tempora dolorum repellat reiciendis cum soluta deserunt ex voluptatibus, nam illum veniam £5550 quidem aperiam sequi, nostrum sed? Quidem eveniet maiores #5550 autem. https://codepen.io/pen/5000/3454


There are a few similar questions already on StackOverflow . I have gone through some of them(links below), but I still can not do this. Please before marking this question as duplicate, check if your solution finds all the occurrence of the 4 digits number in the first paragraph but none in the second paragraph.

You may use the following pattern:

/(?:\bdigit-|\s|^)(\d{4})(?=[.?\s]|-digit\b|$)/gi

See the regex demo . You need to get all Group 1 values.

Details

  • (?:\\bdigit-|\\s|^) - either digit- (as a whole word), whitespace or start of string
  • (\\d{4}) - Group 1: four digits
  • (?=[.?\\s]|-digit\\b|$) - immediately to the right, there must be a . , whitespace, ? , -digit (as a whole word) or end of string. NOTE Without a lookahead, consecutive whitespace-separated matches will be left out.

JS demo:

 var strs = ["Lorem ipsum 3400-digit, sit amet 5000 consectetur adipisicing elit. Natus, explicabo 6700? Itaque iure ipsum laboriosam, ex nemo delectus iste quia cupiditate digit-9134? Iste nam digit-2456 at voluptate est 8456-digit? At excepturi quis voluptatibus 7500.", "Lorem ipsum $5000 dolor sit amet consectetur adipisicing elit. Obcaecati tempora dolorum repellat reiciendis cum soluta deserunt ex voluptatibus, nam illum veniam £5550 quidem aperiam sequi, nostrum sed? Quidem eveniet maiores #5550 autem. https://codepen.io/pen/5000/3454" ]; var rx = /(?:\\bdigit-|\\s|^)(\\d{4})(?=[.?\\s]|-digit\\b|$)/gi; for (var s of strs) { var m, res =[]; while(m=rx.exec(s)) { res.push(m[1]); } console.log(res); } 

(\s|digit-)([0-9]{4})(?=-digit|\.|\?|\s)

You need an OR statement at the beginning and end of your query, with four digits in the middle.

To explain further:

  • (?!\\s|digit-) - negative lookahead: either whitespace or digit-
  • [0-9]{4} - a number from 0 to 9, exactly four times
  • (?=-digit|\\.|\\?|\\s) - positive lookahead: either -digit , a . (escaped because . is a special character in Regex), a question mark (also escaped for the same reason), or whitespace.

Play around on Regex101

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM