PHP regex match Latin words may contains symbols, digits and spaces

Question

There are names of records in which are mixed Cyrillic and Latin words, symbols, spaces, digits, etc.

I need to preg_match (PHP) only Latin part with any symbols in any combinations.

Test set:

БлаблаБла Uty-223
Блабла (бла.)Бла CAROP-C
Бла бла ST.MORITZ
Бла бла RAMIRO2-TED
LA PLYSGNE 1 H - 001

(Блабла) - doesn't matter Cyrillic words.

So i tried pattern:

/[-0-9a-zA-Z.]+/

But [Блабла (бла.)Бла CAROP-C] and [LA PLYSGNE 1 H - 001] not found as string.

Next i tried to write more flexible pattern:

/[-0-9a-zA-Z]+(?:.)?+(?:\s+)?+[-0-9a-zA-Z]+/

But there is still problem with matching [LA PLYSGNE 1 H - 001].

Is there any idea how can this be solved?

Thanks.

Answer 1

If the . and - can not occur at the beginning or end, you can start the match with [0-9a-zA-Z] and optionally repeat one of the chars listed in the character class followed by again [0-9a-zA-Z]

\b[0-9a-zA-Z]+(?:[.\h-]+[0-9a-zA-Z]+)*\b

The \b is a word boundary preventing a partial word match
\h matches a horizontal whitespace character

See a regex101 demo .

Matching at least a single char [0-9a-zA-Z] with allowed chars . and - in the whole string, and asserting whitespace boundaries to the left and right

(?<!\S)[.-]*\b[0-9a-zA-Z](?:[0-9a-zA-Z.\h-]*[0-9a-zA-Z.-])?(?!\S)

Using (?<!\S) and (?!\S) are lookaround assertions that are whitespace boundaries, asserting not a non whitespace char to the left and the right.

See a regex101 demo .

Answer 2

You can also use a script run starting with a latin letter:

~(*sr:\p{Latin}.*\S)~u

demo

PHP regex match Latin words may contains symbols, digits and spaces

Question

2 answers

solution1
1 ACCPTED 2023-01-21 12:50:40

solution2
1 2023-01-23 19:34:16

PHP regex match Latin words may contains symbols, digits and spaces

Question

2 answers

solution1 1 ACCPTED 2023-01-21 12:50:40

solution2 1 2023-01-23 19:34:16

solution1
1 ACCPTED 2023-01-21 12:50:40

solution2
1 2023-01-23 19:34:16