简体   繁体   中英

Regex to remove everything but numbers and one character

I need to remove everything but numbers and, if exists one character from a string. It's a street name I need to extract the house number of. It is possible that there is some more content after the string, but not neccessarely.

The original string is something like

Wagnerstrasse 3a platz53,eingang 3,Zi.3005 

I extract the street with number like this:

preg_match('/^([^\d]*[^\d\s]) *(\d.*)$/', $address, $match);

Then, I do an if statement on "Wagnerstrasse 3a"

if (preg_replace("/[^0-9]/","",$match[2]) == $match[2])

I need to change the regex in order to get one following letter too, even if there is a space in between, but only if it is a single letter so that my if is true for this condition / Better a regex that just removes everything but below:

Wagnerstrasse 3a       <-- expected result: 3a
Wagnerstrasse 3 a      <--- expected result 3 a 
Wagnerstrasse 3        <--- expected result 3
Wagnerstrasse 3 a bac  <--- expected result 3 a

You can try something like this that uses word boundaries:

preg_match('~\b\d+(?: ?[a-z])?\b~', $txt, $m)

The letter is in an optional group with an optional space before. Even if there is no letter the last word boundary will match with the digit and what follows (space, comma, end of the string...).

Note: to avoid a number in the street name, you can try to anchor your pattern at the first comma in a lookahead, for example:

preg_match('~\b\d+(?: ?[a-z])?\b(?= [^\s]*,)~', $txt, $m)

I let you to improve this subpattern with your cases.

<?php
$s1 = 'Wagnerstrasse 3 platz53,eingang 3,Zi.3005';
$s2 = 'Wagnerstrasse 3a platz53,eingang 3,Zi.3005';
$s3 = 'Wagnerstrasse 3A platz53,eingang 3,Zi.3005';
$s4 = 'Wagnerstrasse 3 a platz53,eingang 3,Zi.3005';
$s5 = 'Wagnerstrasse 3 A platz53,eingang 3,Zi.3005';

//test all $s
preg_match('#^(.+? [0-9]* *[A-z]?)[^A-z]#', $s1, $m);

//if you want only the street number
//preg_match('#^.+? ([0-9]* *[A-z]?)[^A-z]#', $s1, $m);

echo $m[1];
?>

After doing some more research and hours of checking addresses (so many addresses) on the topic I found a solution which, until now, didn't fail. Might be that I didn't realize it, but it seems to be quite good. And it's a regex one has not seen before... The regex fails if there are no numbers in the line. So I did some hacking (mention the millions of nines...)

Basically the regex is excellent for finding numbers at the end and preserves numbers in the middle of the text but fails for above mentionend fact and if the street starts with a number. So I did just another little hack and explode the first number to the back and catch it as number.

if ($this->startsWithNumber($data))
{
    $tmp = explode(' ', $data);
    $data = trim(str_replace($tmp[0], '', $data)) . ' ' . $tmp[0];
}
if (!preg_match('/[0-9]/',$data)) 
{
    $data .= ' 99999999999999999999999999999999999999999999999999999999999999999999999';
}
$data = preg_replace("/[^ \w]+/",'',$data);

                    $pcre = '/\A\s*
(.*?) # street
\s*
\x2f? # slash
(
    \pN+\s*[a-zA-Z]? # number + letter
    (?:\s*[-\x2f\pP]\s*\pN+\s*[a-zA-Z]?)* # cut
) # number
\s*\z/ux';
                    preg_match($regex, $data, $h);

$compare = strpos($h[2],'999999999999999999999999999999999999999999999999999999999999999999999999');
                    if ($compare !== false) {
                        $h[2] = null;
                    }
                    $this->receiverStreet[] = (isset($h[1])) ? $h[1] : null;
                    $this->receiverHouseNo[] = (isset($h[2])) ? $h[2]  : null;

public function startsWithNumber($str)
    {
        return preg_match('/^\d/', $str) === 1;
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM