简体   繁体   中英

Regular Expression to find words separated with space, backtracking

I have to find words separated by space. What best practice to do it with the smallest backtracking?

I found this solution:

Regex: \d+\s([a-zA-Z]+\\s{0,1}){1,} in a sentence
Input: 1234 this is words in a sentence

So, this is words - i have to check using regex ([a-zA-Z]+\\\\s{0,1}){1,} and words in a sentence i have to check by constant words in regex in a sentences .

But in this case regex101.com gives me debug with 4156 steps and this is Catastrophic Backtracking. Any way to avoid it?

I have other more complicated example, where it takes 86000 steps and it does not validate.

Main problem, that i have to find all words separated by space, but in the same time regex contains words separated by space (constants). This is where i have Catastrophic Backtracking.

I have to do this using Java.

You could try splitting the String into a String array, then find the size of the array after eliminating any members of the array that do not match your definition of a word (ex. a whitespace or puncuation)

String[] mySplitString = myOriginalString.split(" ");
for(int x = 0; x < mySplitString.length; x++){
    if(mySplitString[x].matches("\\w.*"/*Your regex for a word here*/)) words++;
}

mySplitString is an array of Strings that have been split from an original string. All whitespace characters are removed and substrings that were before, after, or in-between whitespaces are placed into the new String array. The for-loop runs through the split String array and checks to make sure that each array member contains a word (characters or numbers atleast once) and adds it to a total word count.

You want to find words separated by space .So you should say at least 1 or more space .You can use this instead which takes just 37 steps.

\d+\s([a-zA-Z]+\s+)+in a sentence

See demo.

https://regex101.com/r/tD0dU9/4

For java double escape all ie \\d==\\\\d

If I understood it right, you want to match any word separeted by space plus the sentence "in a sentence".

You can try the following solution:

(in a sentence)|(\S+)

As seen in this example on regex101: Exemple

The regex matchs in 61 steps. You might have problems with punctuation after the "in a sentence" sentence. Make some tests.

I hope I was helpfull.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM