简体   繁体   中英

Regex to identify characters in keyword

I want to examine 1 or more characters that constitute a keyword.

So if the the keyword is "show", then s, sh, sho, or show all qualify but all other combinations would fail.

I'm thinking look-ahead is the solution but unsure how to make them optional and still enforce the requirement.

As in..

echo "s" | perl -ne 'print if /s(?=h)?(?=o)?(?=w)?/' should print

echo "sh" | perl -ne 'print if /s(?=h)?(?=o)?(?=w)?/' should print

echo "sho" | perl -ne 'print if /s(?=h)?(?=o)?(?=w)?/' should print

echo "show" | perl -ne 'print if /s(?=h)?(?=o)?(?=w)?/' should print

and

echo "st" | perl -ne 'print if /s(?=h)?(?=o)?(?=w)?/' should fail

echo "sto" | perl -ne 'print if /s(?=h)?(?=o)?(?=w)?/' should fail

echo "stop" | perl -ne 'print if /s(?=h)?(?=o)?(?=w)?/' should fail

etc.

Reverse think it. Use

'show' =~ m{^$keyword$}

Use index rather than a regex:

perl -nle 'print if index("show", $_) == 0'

( -l removes the newline from $_ and adds one after print )

This one-liner will print the input if it's a prefix of show (ie, if the input is a substring of show that starts at index 0 ).


If you really need a regex, I would suggest:

/^s(h(ow?)?)?$/

(use (?: instead of the ( if performance of capture groups matter: it's basically the same thing except that it doesn't capture the group)

This kind of regex should be fairly easy to build programmatically with a recursive function:

sub build_re {
  my ($first, $end) = split //, $_[0], 2;
  return $first if $end eq "";
  return "$first(" . build_re($end) . ")?";
}

my $re = build_re("show");  # prints s(h(o(w)?)?)?

print "s" =~ /^$re$/ ? 1 : 0; # 1
print "sh" =~ /^$re$/ ? 1 : 0; # 1
print "show" =~ /^$re$/ ? 1 : 0; # 1

print "showw" =~ /^$re$/ ? 1 : 0; # 0
print "how" =~ /^$re$/ ? 1 : 0; # 0

The 3rd argument of split ( 2 ) tells split to only split in 2 fields rather than "as many as possible" (the default). This way, this split splits the input of build_re into "1st character" and "the rest". It's somewhat equivalent to my ($first, $end) = $_[0] =~ /^(.)(.*)$/ (assuming that the input is on a single line).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM