简体   繁体   中英

Pattern matching for swedish character

I need a help regarding regular expression.

I have to match string like this: âãa34dc

Pattern that i have used:

\s*[a-zA-Z]+[a-zA-Z_0-9]*\s

but this pattern is not good enough to identify this kind of string eg âãa34dc

PS âã these are swedish character.

Please help me for find out correct pattern for this kind of string.

Do you actually want to restrict it to Swedish characters? In other words, should a German character not match? If so, then you'll probably have to enumerate the whole alphabet, and include that.

If what you really want is to match every alphabetic character, use the regular expression terms for matching all letters.

\w matches any word character, but that includes numbers & some punctuation. That's close, but not exactly what you want for your second term.

For the first term, where you don't want to include numbers, specifying that the character should be a Unicode 'letter' class will work. \p{L} specifies all Unicode characters that are a letter. This includes [a-zA-Z], and all the Swedish characters, and German, and Russian, etc.

Therefore, I think this regular expression is what you want:

\s*[\p{L}][\p{L}_0-9]*\s

If you want to include digits from other character sets , and some other punctuation , then you can use [\w]* for the second term.

please give a set of rules.

according to your question:

    [X-Ya-zA-Z]{3}[0-9]{2}[a-zA-Z]{2}

Replace X with the first swedish letter

Replace Y with the last swedish letter

John Machin provides a great answer for this. Adapting his pattern, what you need is probably something similar to: \s*[^\W\d_]\w*\s*

PS I removed the + quantifier from your first part. Any subsequent letters would be matched by the subsequent quantified \w .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM