简体   繁体   中英

Regex to find the number of extra spaces, including trailing and leading spaces

I'm trying to count the number of extra spaces, including trailing and leading spaces in a string. There are a lot of suggestions out there, but none of them get the count exactly right.

Example ( _ indicates space)

__this is a string__with extra spaces__

should match 5 extra spaces.

Here's my code:

if (my @matches = $_[0] =~ m/(\s(?=\s)|(?<=\s)\s)|^\s|\s$/g){
    push @errors, {
        "error_count" => scalar @matches,
        "error_type"  =>  "extra spaces",
    };
}

The problem with this regex is that it counts spaces in the middle twice. However, if I take out one of the look-ahead/look-behind matches, like so:

$_[0] =~ m/\s(?=\s)|^\s|\s$/g

It won't count two extra spaces at the beginning of a string. (My test string would only match 4 spaces.)

Try

$_[0] =~ m/^\s|(?<=\s)\s|\s(?=\s*$)/g

This should match

  1. the first space (if one exists),
  2. each space that follows a space,
  3. and that one trailing space that immediately follows the last non-space (the rest of the trailing spaces are already counted by the second case).

In other words, for your example, here's what each of the three cases would match:

__this is a string _with extra spaces__
12                 2                 32

This also works for the edge case of all spaces:

_____
12222

This regex should match all unnecessary individual spaces

^( )+|( )(?= )|( )+$

or

$_[0] =~ m/^( )+|( )(?= )|( )+$/g

You could change the spaces to \\s but then it'll count tabs as well.

Working on RegexPal

Breakdown:

^( )+ Match any spaces connected to the start of the line

( )(?= ) Match any spaces that are immediately followed by another space

( )+$ Match any spaces connected to the end of the line

With three simple regular expressions (and replacing spaces with underscores for clarity) you could use:

use strict;
use warnings;

my $str = "__this_is_a_string__with_extra_underscores__";

my $temp = $str;

$temp =~ s/^_+//;
$temp =~ s/_+$//;
$temp =~ s/__+/_/g;

my $num_extra_underscores = (length $str) - (length $temp);

print "The string '$str' has $num_extra_underscores extraunderscores\n";

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM