简体   繁体   中英

Regex to match any number unless it is part of a specific string

Sorry if this is a dupe, I did search but couldn't seem to find something that matched my query.

I have a replacer function in java that runs multiple regexes to find and replace specific strings.

One of them is looking at numbers, and if it finds a number it adds space around it, for example;

test123 > test 123

regex used is "(([0-9]+)" and it replaces it with " $1 "

I have hit an issue now though that in a few edge cases I need to not split the number from a specific string, like hash names for example. So I now need to update my regex to wrap any combination of numbers with spaces, UNLESS it matches a specific sequence.

For example, I want the following results;

  • test123 > test 123
  • 84test > 84 test
  • test md5 > test md5
  • sha256 > sha256
  • word two sha1 > word two sha1
  • w0rd > w 0 rd
  • aisha256 > aisha 256
  • word md 5 > word md 5 etc

I've tried using negative lookbehind to match the words like md5, sha1, sha256, etc but it still seems to split the numbers. I'm sure its something simple I am doing wrong.... "((?!md5)(\d+))"

So basic rules are, any digit found in the string should be surrounded by spaces UNLESS it is preceeded by the word sha or md. If there is whitespace already between the number and md or sha, the whitespace should remain. sha or md could be the start of the string OR be preceeded by whitespace or an. underscore, but cannot be the end of a longer word or in the middle of a word.

Thanks

The following regex seems to be working:

(?<=\d)(?=\D)|(?<=\D)(?<!sha|md|^)(?=\d)|_

Just replace the above with a single space.

Demo

Java code:

List<String> inputs = Arrays.asList("test123", "84test", "test_md5", "sha256",
                                    "word_two_sha1", "w0rd");
for (String input : inputs) {
    String output = input.replaceAll("(?<=\\d)(?=\\D)|(?<=\\D)(?<!sha|md|^)(?=\\d)|_", " ");
    System.out.println(input + " > " + output);
}

This prints:

test123 > test 123
84test > 84 test
test_md5 > test md5
sha256 > sha256
word_two_sha1 > word two sha1
w0rd > w 0 rd

The basic regex strategy here is to split at a boundary between a digit and non digit character, unless what precedes be sha or md .

As an alternative, you might also use

(?<!\d|^)(?<!(?<![^\W_])(?:sha|md))(?=\d)|(?<=\d)(?!\d|$)|_

It will either match the position between a digit and an non digit or an underscore.

In case there is a digit on the right, what comes before the digit can not be sha or md which is not preceded by any char except a word char without the underscore.

Explanation

  • (?<!\d|^) If not looking back at a digit or start of string
  • (?<! If not looking back on
    • (?<![^\W_]) If not looking back on a word char except an underscore
    • (?:sha|md) Match sha or md followed by an optional digit
  • ) Close lookbehind
  • (?=\d) Assert a digit directly to the right
  • | Or
  • (?<=\d)(?!\d|$) If looking back at a digit and not looking forward to a whitespace char or end of string
  • | Or
  • _ Match an underscore

Regex demo | Java demo

Example

String strings[] = {"Aisha256", "ai_sha256", "test123", "84test", "test md5", "sha256", "word two sha1", "w0rd", "test_md5", "sha256", "md5"};
for (String str : strings){
    System.out.println(str.replaceAll("(?<!\\d|^)(?<!(?<![^\\W_])(?:sha|md))(?=\\d)|(?<=\\d)(?!\\d|$)|_", " "));
}

Output

Aisha 256
ai sha256
test 123
84 test
test md5
sha256
word two sha1
w 0 rd
test md5
sha256
md5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM