简体   繁体   中英

Regex match word followed by decimal from text

I want to be able to match the following examples and return array of matches

given text:

some word
another 50.00 
some-more 10.10 text
another word

Matches should be (word, followed by space then decimal number (Optionally followed by another word):

another 50.00 
some-more 10.10 text

I have the following so far:

     string pat = @"\r\n[A-Za-z ]+\d+\.\d{1,2}([A-Za-z])?";
        Regex r = new Regex(pat, RegexOptions.IgnoreCase);
        Match m = r.Match(input);

but it only matches first item: another 50.00

You do not account for - with [A-Za-z ] and only match some text after a newline.

You can use the following regex:

[\p{L}-]+\p{Zs}*\d*\.?\d{1,2}(?:\p{Zs}*[\p{L}-]+)?

See the regex demo

The [\\p{L}-]+ matches 1 or more letters and hyphens, \\p{Zs}* matches 0 or more horizontal whitespace symbols, \\d*\\.?\\d{1,2} matches a float number with 1 to 2 digits in the decimal part, and (?:\\p{Zs}*[\\p{L}-]+)? matches an optional word after the number.

Here is a C# snippet matching all occurrences based on Regex.Matches method :

var res = Regex.Matches(str, @"[\p{L}-]+\p{Zs}*\d*\.?\d{1,2}(?:\p{Zs}*[\p{L}-]+)?")
              .Cast<Match>()
              .Select(p => p.Value)
              .ToList();

Just FYI: if you need to match whole words, you can also use word boundaries \\b :

\b[\p{L}-]+\p{Zs}*\d*\.?\d{1,2}(?:\p{Zs}*[\p{L}-]+)?\b

And just another note: if you need to match diacritics, too, you may add \\p{M} to the character class containing \\p{L} :

[\p{L}\p{M}-]+\p{Zs}*\d*\.?\d{1,2}(?:\p{Zs}*[\p{L}\p{M}-]+)?\b

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM