简体   繁体   中英

What is the pattern for empty string?

I need to validate input: valid variants are either number or empty string. What is the correspondent regular expression?

String pattern = "\d+|<what shoudl be here?>";

UPD: dont suggest "\\d*" please, I'm just curious how to tell "empty string" in regexp.

In this particular case, ^\\d*$ would work, but generally speaking, to match pattern or an empty string, you can use:

^$|pattern

Explanation

  • ^ and $ are the beginning and end of the string anchors respectively.
  • | is used to denote alternates, eg this|that .

References

Related questions


Note on multiline mode

In the so-called multiline mode ( Pattern.MULTILINE/(?m) in Java), the ^ and $ match the beginning and end of the line instead. The anchors for the beginning and end of the string are now \\A and \\Z respectively.

If you're in multiline mode, then the empty string is matched by \\A\\Z instead. ^$ would match an empty line within the string.


Examples

Here are some examples to illustrate the above points:

String numbers = "012345";

System.out.println(numbers.replaceAll(".", "<$0>"));
// <0><1><2><3><4><5>

System.out.println(numbers.replaceAll("^.", "<$0>"));
// <0>12345

System.out.println(numbers.replaceAll(".$", "<$0>"));
// 01234<5>

numbers = "012\n345\n678";
System.out.println(numbers.replaceAll("^.", "<$0>"));       
// <0>12
// 345
// 678

System.out.println(numbers.replaceAll("(?m)^.", "<$0>"));       
// <0>12
// <3>45
// <6>78

System.out.println(numbers.replaceAll("(?m).\\Z", "<$0>"));     
// 012
// 345
// 67<8>

Note on Java matches

In Java, matches attempts to match a pattern against the entire string .

This is true for String.matches , Pattern.matches and Matcher.matches .

This means that sometimes, anchors can be omitted for Java matches when they're otherwise necessary for other flavors and/or other Java regex methods.

Related questions

/^\d*$/

Matches 0 or more digits with nothing before or after.

Explanation:

The '^' means start of line. '$' means end of line. '*' matches 0 or more occurences. So the pattern matches an entire line with 0 or more digits.

To explicitly match the empty string, use \\A\\Z .

You can also often see ^$ which works fine unless the option is set to allow the ^ and $ anchors to match not only at the start or end of the string but also at the start/end of each line. If your input can never contain newlines, then of course ^$ is perfectly OK.

Some regex flavors don't support \\A and \\Z anchors (especially JavaScript).

If you want to allow "empty" as in "nothing or only whitespace", then go for \\A\\s*\\Z or ^\\s*$ .

Just as a funny solution, you can do:

\d+|\d{0}

A digit, zero times. Yes, it does work.

One of the way to view at the set of regular language as the closure of the below things:

  1. Special < EMPTY_STRING > is the regular language
  2. Any symbol from alphaphet is the valid regular language
  3. Any concatentation and union of two valid regexps is the regular language
  4. Any union of two valid regular language is the regular language
  5. Any transitive closure of the regexp is the regular language

Concreate regular language is concrete element of this closure.


I didn't find empty symbol in POSIX standard to express regular language idea from step (1).

But it is exist extra thing like question mark there which is by posix definition is the following:

(regexp|< EMPTY_STRING >)

So you can do in the following manner for bash, perl, and python:

echo 9023 | grep -E "(1|90)?23"
perl -e "print 'PASS' if (qq(23) =~ /(1|90)?23/)"
python -c "import re; print bool(re.match('^(1|90)?23$', '23'))"

只有"\\d+|"应该没有任何问题

To make any pattern that matches an entire string optional, ie allow a pattern match an empty string, use an optional group :

^(pattern)?$
^^       ^^^

See the regex demo

If the regex engine allows (as in Java), prefer a non-capturing group since its main purpose is to only group subpatterns, not keep the subvalues captured:

^(?:pattern)?$

The ^ will match the start of a string (or \\A can be used in many flavors for this), $ will match the end of string (or \\z can be used to match the very end in many flavors, and Java, too), and the (....)? will match 1 or 0 (due to the ? quantifier) sequences of the subpatterns inside parentheses.

A Java usage note: when used in matches() , the initial ^ and trailing $ can be omitted and you can use

String pattern = "(?:\d+)?";

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM