简体   繁体   中英

Can a regular expression match a character at the beginning OR end of the string (but not both)?

I'm writing a regular expression to validate euro currency strings. It allows several different formats, since some locales use decimal points for thousands separators, some use spaces, some put the € at the beginning and some put the € at the end. Here's what I've come up with:

/^(€ ?)?\\-?([1-9]{1,3}( \\d{3})*|[1-9]{1,3}(\\.\\d{3})*|(0|([1-9]\\d*)?))(,[0-9]{2})?( ?€)?$/

This is working for the following tests:

valid:

123 456,78
123.456,78
€6.954.231
€ 896.954.231
16.954.231 €
12 346 954 231€
€10,03
10,03
1,39
,03
0,10
€10567,01
€ 0,01
€1 234 567,89
€1.234.567,89

invalid

1,234 € 1,1
50#,50
123,@€
€€500
0001
€ ,001
€0,001
12.34,56
123456.123.123456

One problem with this is it validates a string with the euro symbol on both ends, eg €123€. This is probably acceptable for my purposes, but is there a way to make a compact RegEx that only allows that character at one end and not both, or do I just have to write one that's twice as long, checking first for a valid string with optional € at the beginning and then a valid string with optional € at the end?

UPDATE The one in the accepted answer still has a few false positives. I ended up writing a function that takes several options to customize the validator. It's the isCurrency function in this library . Still uses the lookahead to avoid certain edge cases, which was the key to answering this question.

With lookahead this would work

^(?!€*$)(€ ?(?!.*€)(?=,?\d))?\-?([1-9]{1,3}( \d{3})*|[1-9]{1,3}(\.\d{3})*|(0|([1-9]\d*)?))(,[0-9]{2})?( ?€)?$

See: https://regex101.com/r/aR4xR8/8

@Necreaux deserves the credit for pointing at lookahead first!

根据您的正则表达式引擎,您可以使用负向前导来执行此操作。

^€(?!(.*€))

You can use this pattern:

^
(?=(.))          # you capture the first character in a lookahead
(?:€[ ]?)?
(?:
    [1-9][0-9]{0,2}
    (?:
        ([ .]) [0-9]{3} (?: \2 [0-9]{3})*
      |
        [0-9]*
    )
    (?:,[0-9]{2})?
  |
    0?,[0-9]{2}
)

(?:
    [ ]?
    (?!\1)€   # you test if the first character is not an €
)?
$

online demo

The idea is to capture the first character and to test if it isn't the same at the end.

To use it with javascript you need to remove the formatting:

var re = /^(?=(.))(?:€ ?)?(?:[1-9][0-9]{0,2}(?:([ .])[0-9]{3}(?:\2[0-9]{3})*|[0-9]*)(?:,[0-9]{2})?|0?,[0-9]{2})(?: ?(?!\1)€)?$/;

About this way: the only interest is the shortness. If you want the performance the best way is to write literally the two possibilities:

var re = /^(?:€ ?(?:[1-9][0-9]{0,2}(?:([ .])[0-9]{3}(?:\1[0-9]{3})*|[0-9]*)(?:,[0-9]{2})?|0?,[0-9]{2})|(?:[1-9][0-9]{0,2}(?:([ .])[0-9]{3}(?:\2[0-9]{3})*|[0-9]*)(?:,[0-9]{2})?|0?,[0-9]{2})(?: ?€)?)$/;

It's more long to write, but it reduces the regex engine work.

With regex engines that support conditional subpatterns like PCRE, you can write this:

\A
(€ ?)?
(?:
    [1-9][0-9]{0,2}
    (?: ([ .]) [0-9]{3} (?:\2[0-9]{3})* | [0-9]*)
    (?:,[0-9]{2})?
  | 
    0?,[0-9]{2}
)
(?(1)| ?€)
\z

Where (?(1)| ?€) is an if..then..else : (?(condition)true|false) that checks if the capture group 1 is defined.

you can split your Regex in two party and combine them with '|'. one for anything atarting with € and the other for € at the end.

/(^(€ ?)?\-?([1-9]{1,3}( \d{3})*|[1-9]{1,3}(\.\d{3})*|(0|([1-9]\d*)?))(,[0-9]{2})?$)|(^\-?([1-9]{1,3}( \d{3})*|[1-9]{1,3}(\.\d{3})*|(0|([1-9]\d*)?))(,[0-9]{2})?( ?€)?$)/

Edit:

sry I missed your last sentence. I think the easiest is to write the regex twice as long.

This is the closest I've been able to come. It uses negative lookahead to make sure that the string doesn't begin and end with the euro symbol :

^(?!€.*€$)€?\s*(0|[1-9][0-9]{0,2})?([. ]?[0-9]{3})*(,[0-9]{2})?\s*€?$

See Regex 101 Demo here for full explanation and examples. As you can see it passes all of your tests, but it lets a couple of bad ones through. I'm sure the digit portion can be tweaked so that it works for you. The part that makes sure there are not two euro symbols is just this:

^(?!€.*€$)€?\s*<digit validation goes here>\s*€?$

Negative lookahead makes sure the string doesn't start and end with the euro symbol, then it checks for optional euro symbol at start followed by an arbitrary # of spaces, validates the digits, then checks for an arbitrary # of spaces and a euro symbol at the end.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM