简体   繁体   English

正则表达式用于特殊字符之间的多个单词

[英]Regex for multiple words between special characters

I'm trying to get every group of words with at least one word between some special characters with a regular expression in Java. 我试图在Java中使用正则表达式在某些特殊字符之间获取至少每个单词组成的每组单词。 These are some sample strings to clarify it: 这些是一些示例字符串来阐明这一点:

{ ? <> <> ; <> ? ; <> ? . ? <> ? . ? <> ? . ? <> ? }
{ <> <> ? . <> <> ? }
{ <> <> <> }
{ OPTIONAL { <> <> ? } FILTER ( ! bound(?) ) }
{ FILTER not exists ( ! bound(?) ) }
{ <> <> ? . ? <> ? }
{ ? <> <> ; a <> }
{ <> <> ?@en }
{ <> <> <> }
{ <> <> ? . <> <> ? FILTER ( ? > ? ) }
{ <> <> ? . ? <> ? FILTER regex(? ?) }
{ <> <> ? FILTER ( ! bound(?) ) }
{ ? <> ? ; <> ? . ? <> ? }
{ ? <> ? ; <> ? . ?2 <> ? ; <> ? }
{ ? <> <> ; <> ? . ? <> ? }
{ <> <> ? . <> <> ? FILTER ( ? = ? ) }

My matches shall look like this: 我的比赛看起来像这样:

OPTIONAL
FILTER
bound
FILTER not exists
bound
...

This is the regex I've come up with so far: 到目前为止,这是我想出的正则表达式:

[^\d\W\\a\@]+

You can test it here: https://regex101.com/r/cP3Uri/2 您可以在这里进行测试: https//regex101.com/r/cP3Uri/2

My problem is that my regex will find only full words and no groups of words (with a space in between). 我的问题是我的正则表达式只能找到完整的单词,而找不到单词组(中间有空格)。 This means this substring FILTER not exists will get 3 matches (one for every word) but I want it to be just one match. 这意味着这个FILTER not exists子字符串FILTER not exists将获得3个匹配项(每个单词一个),但我希望它只是一个匹配项。

Can anyone help me finding the correct regex? 谁能帮助我找到正确的正则表达式?

You can use [a-zA-Z]{2}[a-zA-Z ]*\\\\b to find minimum a two character word 您可以使用[a-zA-Z]{2}[a-zA-Z ]*\\\\b查找最少两个字符的单词

  • [a-zA-Z]{2} : match exactly 2 upper or lower case letter [a-zA-Z]{2} :精确匹配2个大写或小写字母
  • [a-zA-Z ]*\\\\b : match zero or more upper and lower case characters , word boundary [a-zA-Z ]*\\\\b :匹配零个或多个大写和小写字符,单词边界

To find only words followed by only words with spaces use 要仅查找单词,然后仅查找带有空格的单词,请使用

[a-zA-Z]{2}(?:\\\\s*[a-zA-Z]{2,})*

\w+(?:\s*\w+)*

for capturing all groups including the 'a' and 2 character 用于捕获包括“ a”和2个字符在内的所有组

\w{2}(?:\s*\w+)*

for only capturing groups with more than one character 仅用于捕获具有多个字符的组

you can replace \\w with [a-zA-Z] to exclude digits. 您可以将\\ w替换为[a-zA-Z]以排除数字。

see https://regex101.com/r/cP3Uri/7 参见https://regex101.com/r/cP3Uri/7

You can use one of these, which respect your original pattern: 您可以使用以下其中一种来尊重您的原始模式:

[^\d\W\\a\@]([^\d\W\\a\@]| )*\b
[^\d\W\\a\@]+( +[^\d\W\\a\@]+)*

See demo: 1 and 2 参见演示: 12

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM