简体   繁体   中英

Regular expressions: extract numbers separated by commas from strings

I need to extract numbers separated by commas from strings like this (with an arbitrary count of numbers and spaces):

Expression type:            Answer:
(1, 2,3)                    1,2,3
(1,3,4,5,77)                1,3,4,5,77
( b(2,46,8,4,5, 52)    y)   2,46,8,4,5,52
(a (3, 8,2, 1, 2, 9) x)     3,8,2,1,2,9

Try this pattern:

\((?:\s*\d+\s*,?)+\)

For example:

var results = Regex.Matches(input, @"\((?:\s*\d+\s*,?)+\)");
Console.WriteLine(results[0].Value); // (1,2,3)

If you'd like to convert this to a list of integers you can do this fairly easily with Linq:

var results = Regex.Matches(input, @"\((?:\s*(\d+)\s*,?)+\)")
                   .Cast<Match>()
                   .SelectMany(m => m.Groups.Cast<Group>()).Skip(1)
                   .SelectMany(g => g.Captures.Cast<Capture>())
                   .Select(c => Convert.ToInt32(c.Value));

Or in query syntax:

var results = 
    from m in Regex.Matches(input, @"\((?:\s*(\d+)\s*,?)+\)").Cast<Match>()
    from g in m.Groups.Cast<Group>().Skip(1)
    from c in g.Captures.Cast<Capture>()
    select Convert.ToInt32(c.Value);

is the exaclty search string you willl always have like you posted it?

(number1,number2,numer3) text...

Edit: You provided new examples this should handle them:

    string input = "( b(2,46,8,4,5, 52)    y)";
    input = input.Remove(" ","");
    var result = Regex.Matches(input, @"\(([0-9]+,)+[0-9]+\)");
    Console.WriteLine(result[0]);

Seeing there could also be spaces, here is a suggestion, that unrolls the loop (which is a bit more efficient for larger inputs):

@"[(]\d+(?:,\d+)*[)]"

You can of course escape the parentheses with backslashes, too. I just wanted to show an alternative, which I personally find more readable.

If you eventually want to get the numbers, instead of splitting the result of the regex, you can capture them right away:

@"[(](?<numbers>\d+)(?:,(?<numbers>\d+))*[)]"

Now the group numbers will be a list of all the numbers (as strings).

I totally forgot about the spaces again, so here it is with spaces (which are not part of the captures):

@"[(]\s*(?<numbers>\d+)\s*(?:,\s*(?<numbers>\d+)\s*)*[)]"

I'd probably use a regular expression like this:

\((\d+(?:\s*,\s*\d+)*)\)

with PowerShell code like this:

$str = @(
    "(1, 2,3)"
  , "(1,3,4,5,77)"
  , "( b(2,46,8,4,5, 52)"
  , "(a (3, 8,2, 1, 2, 9) x)"
  , "(1)"
  , "(1 2, 3)"    # no match (no comma between 1st and 2nd number)
  , "( 1,2,3)"    # no match (leading whitespace before 1st number)
  , "(1,2,3 )"    # no match (trailing whitespace after last number)
  , "(1,2,)"      # no match (trailing comma)
)
$re  = '\((\d+(?:\s*,\s*\d+)*)\)'

$str | ? { $_ -match $re } | % { $matches[1] -replace '\s+', "" }

The regular expression will match a (sub)string that starts with an opening parenthesis followed by a comma-separated sequence of numbers (which may contain any number of whitespace before or after a comma) and ends with a closing parenthesis. The whitespace is subsequently removed by the -replace instruction.

If you don't want to match single numbers ( "(1)" ), change the regular expression to this:

\((\d+(?:\s*,\s*\d+)+)\)

If you want to allow whitespace after the opening or before the closing parenthesis, change the regular expression to this:

\(\s*(\d+(?:\s*,\s*\d+)*)\s*\)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM