简体   繁体   English

正则表达式是否可以匹配字符串开头或结尾的字符(但不能同时匹配)?

[英]Can a regular expression match a character at the beginning OR end of the string (but not both)?

I'm writing a regular expression to validate euro currency strings. 我正在写一个正则表达式来验证欧元货币字符串。 It allows several different formats, since some locales use decimal points for thousands separators, some use spaces, some put the € at the beginning and some put the € at the end. 它允许几种不同的格式,因为一些语言环境使用小数点作为千位分隔符,一些使用空格,一些将€放在开头,一些放在最后。 Here's what I've come up with: 这是我想出的:

/^(€ ?)?\\-?([1-9]{1,3}( \\d{3})*|[1-9]{1,3}(\\.\\d{3})*|(0|([1-9]\\d*)?))(,[0-9]{2})?( ?€)?$/

This is working for the following tests: 这适用于以下测试:

valid: 有效:

123 456,78 123 456,78
123.456,78 123.456,78
€6.954.231 €6.954.231
€ 896.954.231 €896.954.231
16.954.231 € 16.954.231€
12 346 954 231€ 12 346 954 231€
€10,03 €10.03
10,03 10.03
1,39 1,39
,03 ,03
0,10 0,10
€10567,01 €10567,01
€ 0,01 €0,01
€1 234 567,89 €1 234 567,89
€1.234.567,89 €1.234.567,89

invalid 无效

1,234 € 1,1 1,234€1,1
50#,50 50#,50
123,@€ 123,@€
€€500 €€500
0001 0001
€ ,001 €,001
€0,001 €0.001
12.34,56 12.34,56
123456.123.123456 123456.123.123456

One problem with this is it validates a string with the euro symbol on both ends, eg €123€. 这样做的一个问题是它验证了两端带有欧元符号的字符串,例如€123€。 This is probably acceptable for my purposes, but is there a way to make a compact RegEx that only allows that character at one end and not both, or do I just have to write one that's twice as long, checking first for a valid string with optional € at the beginning and then a valid string with optional € at the end? 这对我的目的来说可能是可以接受的,但是有没有办法制作一个紧凑的RegEx,它只允许一端的字符,而不是两者,或者我只需要编写一个两倍长的字符,首先检查一个有效的字符串开头是可选的€,然后是最后一个带有可选€的有效字符串?

UPDATE The one in the accepted answer still has a few false positives. 更新接受答案中的答案仍有一些误报。 I ended up writing a function that takes several options to customize the validator. 我最终编写了一个函数,它有几个选项来自定义验证器。 It's the isCurrency function in this library . 这是该库中isCurrency函数。 Still uses the lookahead to avoid certain edge cases, which was the key to answering this question. 仍然使用前瞻来避免某些边缘情况,这是回答这个问题的关键。

With lookahead this would work 有了先行,这将起作用

^(?!€*$)(€ ?(?!.*€)(?=,?\d))?\-?([1-9]{1,3}( \d{3})*|[1-9]{1,3}(\.\d{3})*|(0|([1-9]\d*)?))(,[0-9]{2})?( ?€)?$

See: https://regex101.com/r/aR4xR8/8 请参阅: https//regex101.com/r/aR4xR8/8

@Necreaux deserves the credit for pointing at lookahead first! @Necreaux值得称赞,首先指出前瞻!

根据您的正则表达式引擎,您可以使用负向前导来执行此操作。

^€(?!(.*€))

You can use this pattern: 您可以使用此模式:

^
(?=(.))          # you capture the first character in a lookahead
(?:€[ ]?)?
(?:
    [1-9][0-9]{0,2}
    (?:
        ([ .]) [0-9]{3} (?: \2 [0-9]{3})*
      |
        [0-9]*
    )
    (?:,[0-9]{2})?
  |
    0?,[0-9]{2}
)

(?:
    [ ]?
    (?!\1)€   # you test if the first character is not an €
)?
$

online demo 在线演示

The idea is to capture the first character and to test if it isn't the same at the end. 我们的想法是捕获第一个字符并测试它最后是否相同。

To use it with javascript you need to remove the formatting: 要与javascript一起使用,您需要删除格式:

var re = /^(?=(.))(?:€ ?)?(?:[1-9][0-9]{0,2}(?:([ .])[0-9]{3}(?:\2[0-9]{3})*|[0-9]*)(?:,[0-9]{2})?|0?,[0-9]{2})(?: ?(?!\1)€)?$/;

About this way: the only interest is the shortness. 关于这种方式:唯一的兴趣是短缺。 If you want the performance the best way is to write literally the two possibilities: 如果你想要性能,最好的方法是从字面上写出两种可能性:

var re = /^(?:€ ?(?:[1-9][0-9]{0,2}(?:([ .])[0-9]{3}(?:\1[0-9]{3})*|[0-9]*)(?:,[0-9]{2})?|0?,[0-9]{2})|(?:[1-9][0-9]{0,2}(?:([ .])[0-9]{3}(?:\2[0-9]{3})*|[0-9]*)(?:,[0-9]{2})?|0?,[0-9]{2})(?: ?€)?)$/;

It's more long to write, but it reduces the regex engine work. 写入时间更长,但它减少了正则表达式引擎的工作。

With regex engines that support conditional subpatterns like PCRE, you can write this: 使用支持PCRE等条件子模式的正则表达式引擎,您可以这样写:

\A
(€ ?)?
(?:
    [1-9][0-9]{0,2}
    (?: ([ .]) [0-9]{3} (?:\2[0-9]{3})* | [0-9]*)
    (?:,[0-9]{2})?
  | 
    0?,[0-9]{2}
)
(?(1)| ?€)
\z

Where (?(1)| ?€) is an if..then..else : (?(condition)true|false) that checks if the capture group 1 is defined. 其中(?(1)| ?€)if..then..else(?(condition)true|false) ,用于检查是否定义了捕获组1。

you can split your Regex in two party and combine them with '|'. 你可以将你的正则表达式分成两个派对并将它们与'|'结合起来。 one for anything atarting with € and the other for € at the end. 最后一个用€和另一个用于€的东西。

/(^(€ ?)?\-?([1-9]{1,3}( \d{3})*|[1-9]{1,3}(\.\d{3})*|(0|([1-9]\d*)?))(,[0-9]{2})?$)|(^\-?([1-9]{1,3}( \d{3})*|[1-9]{1,3}(\.\d{3})*|(0|([1-9]\d*)?))(,[0-9]{2})?( ?€)?$)/

Edit: 编辑:

sry I missed your last sentence. 我错过了你的最后一句话。 I think the easiest is to write the regex twice as long. 我认为最简单的是将正则表达式编写两次。

This is the closest I've been able to come. 这是我能够来的最接近的。 It uses negative lookahead to make sure that the string doesn't begin and end with the euro symbol : 它使用负向前瞻来确保字符串不以欧元符号开头和结尾:

^(?!€.*€$)€?\s*(0|[1-9][0-9]{0,2})?([. ]?[0-9]{3})*(,[0-9]{2})?\s*€?$

See Regex 101 Demo here for full explanation and examples. 有关完整说明和示例, 请参阅此处的Regex 101 Demo As you can see it passes all of your tests, but it lets a couple of bad ones through. 正如你所看到的,它通过了你所有的测试,但它让一些不好的测试通过了。 I'm sure the digit portion can be tweaked so that it works for you. 我确定数字部分可以调整,以便它适合你。 The part that makes sure there are not two euro symbols is just this: 确保没有两个欧元符号的部分就是这样:

^(?!€.*€$)€?\s*<digit validation goes here>\s*€?$

Negative lookahead makes sure the string doesn't start and end with the euro symbol, then it checks for optional euro symbol at start followed by an arbitrary # of spaces, validates the digits, then checks for an arbitrary # of spaces and a euro symbol at the end. 负向前瞻确保字符串不以欧元符号开头和结尾,然后在开始时检查可选的欧元符号,后跟任意数量的空格,验证数字,然后检查任意数量的空格和欧元符号在末尾。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM