简体   繁体   中英

Python regex for number with or without decimals using a dot or comma as separator?

I'm just learning regex and now I'm trying to match a number which more or less represents this:

[zero or more numbers][possibly a dot or comma][zero or more numbers]

No dot or comma is also okay. So it should match the following:

1
123
123.
123.4
123.456
.456
123,  # From here it's the same but with commas instead of dot separators
123,4
123,456
,456

But it should not match the following:

0.,1
0a,1
0..1
1.1.2
100,000.99  # I know this and the one below are valid in many languages, but I simply want to reject these
100.000,99

So far I've come up with [0-9]*[.,][0-9]* , but it doesn't seem to work so well:

>>> import re
>>> r = re.compile("[0-9]*[.,][0-9]*")
>>> if r.match('0.1.'): print 'it matches!'
...
it matches!
>>> if r.match('0.abc'): print 'it matches!'
...
it matches!

I have the feeling I'm doing two things wrong: I don't use match correctly AND my regex is not correct. Could anybody enlighten me on what I'm doing wrong? All tips are welcome!

You need to make [.,] part as optional by adding ? after that character class and also don't forget to add anchors. ^ asserts that we are at the start and $ asserts that we are at the end.

^\d*[.,]?\d*$

DEMO

>>> import re
>>> r = re.compile(r"^\d*[.,]?\d*$")
>>> if r.match('0.1.'): print 'it matches!'
... 
>>> if r.match('0.abc'): print 'it matches!'
... 
>>> if r.match('0.'): print 'it matches!'
... 
it matches!

If you don't want to allow a single comma or dot then use a lookahead.

^(?=.*?\d)\d*[.,]?\d*$

DEMO

Your regex would work fine if you just add the ^ at the front and the $ at the back so that system knows how your string would begin and end.

Try this

^[0-9]*[.,]{0,1}[0-9]*$

import re

checklist = ['1', '123', '123.', '123.4', '123.456', '.456', '123,', '123,4', '123,456', ',456', '0.,1', '0a,1', '0..1', '1.1.2', '100,000.99', '100.000,99', '0.1.', '0.abc']

pat = re.compile(r'^[0-9]*[.,]{0,1}[0-9]*$')

for c in checklist:
   if pat.match(c):
      print '%s : it matches' % (c)
   else:
      print '%s : it does not match' % (c)

1 : it matches
123 : it matches
123. : it matches
123.4 : it matches
123.456 : it matches
.456 : it matches
123, : it matches
123,4 : it matches
123,456 : it matches
,456 : it matches
0.,1 : it does not match
0a,1 : it does not match
0..1 : it does not match
1.1.2 : it does not match
100,000.99 : it does not match
100.000,99 : it does not match
0.1. : it does not match
0.abc : it does not match

The problem is that you are asking for a partial match, as long as it starts at the beginning.

One way around this is to end the regex in \\Z (optionally $ ).

\\Z Matches only at the end of the string.

and the other is to use re.fullmatch instead.

import re
help(re.match)
#>>> Help on function match in module re:
#>>>
#>>> match(pattern, string, flags=0)
#>>>     Try to apply the pattern at the start of the string, returning
#>>>     a match object, or None if no match was found.
#>>>

vs

import re
help(re.fullmatch)
#>>> Help on function fullmatch in module re:
#>>>
#>>> fullmatch(pattern, string, flags=0)
#>>>     Try to apply the pattern to all of the string, returning
#>>>     a match object, or None if no match was found.
#>>>

Note that fullmatch is new in 3.4.

You should also make the [.,] part optional, so append a ? to that.

'?' Causes the resulting RE to match 0 or 1 repetitions of the preceding RE. ab? will match either 'a' or 'ab'.

Eg.

import re
r = re.compile("[0-9]*[.,]?[0-9]*\Z")

bool(r.match('0.1.'))
#>>> False

bool(r.match('0.abc'))
#>>> False

bool(r.match('0123'))
#>>> True

How about:

(?:^|[^\d,.])\d*(?:[,.]\d+)?(?:$|[^\d,.])

If you don't want empty string:

(?:^|[^\d,.])\d+(?:[,.]\d+)?(?:$|[^\d,.])
^(?=.?\d)(?!(.*?\.){2,})[\d.]+$|^(?=.?\d)(?!(.*?,){2,})[\d,]+$

Try this.Validates all cases.See demo.

http://regex101.com/r/hS3dT7/9

Some ideas for verifying a non-empty match:

1.) Use of a lookahead to check for at least one digit:

^(?=.?\d)\d*[.,]?\d*$
  • From ^ start to $ end .
  • (?=.?\\d) matches if ,1 , 1 ,...
  • \\d*[.,]?\\d* Allowed sequence: \\d* any amount of digits, followed by one [.,] , \\d*
  • Note, that the first . inside the lookahead is a metacharacter that stands for any character , whereas the other inside the character class [.,] matches a literal .

Instead of the positive lookahead also a negative one could be used: ^(?!\\D*$)\\d*[.,]?\\d*$

Test at regex101 , Regex FAQ


2.) Use 2 different patterns:

^(?:\d+[.,]\d*|[.,]?\d+)$
  • (?: Starts anon-capture group for the alternation.
  • \\d+[.,]\\d* for matching 1. , 1,1 ,... | OR
  • [.,]?\\d+ for matching 1 , ,1 ...

Test at regex101

If the two decimal places are mandatory, you could use the following:

^((\d){1,3},*){1,5}\.(\d){2}$

This will match the following pattern:

  • 1.00
  • 10.00
  • 100.00
  • 1,000.00
  • 10,000.00
  • 100,000.00
  • 1,000,000.00

More generic method can be as follows

import re
r=re.compile(r"^\d\d*[,]?\d*[,]?\d*[.,]?\d*\d$")
print(bool(r.match('100,000.00')))

This will match the following pattern:

  1. This will match the following pattern:
    • 100
    • 1,000
    • 100.00
    • 1,000.00
    • 1,00,000
    • 1,00,000.00
  2. This will not match the following pattern:

    • .100
    • ..100
    • 100.100.00
    • ,100
    • 100,
    • 100.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM