简体   繁体   English

任何符号的Java正则表达式?

[英]Java regex for any symbol?

Is there a regex which accepts any symbol? 有正则表达式接受任何符号吗?

EDIT: To clarify what I'm looking for.. I want to build a regex which will accept ANY number of whitespaces and the it must contain atleast 1 symbol (eg , . " ' $ £ etc.) or (not exclusive or) at least 1 character. 编辑:澄清我正在寻找什么..我想建立一个正则表达式,它将接受任意数量的空格,并且它必须包含至少1个符号(例如,“。''££等)或(不是排他性的)至少1个字符。

Yes. 是。 The dot ( . ) will match any symbol, at least if you use it in conjunction with Pattern.DOTALL flag (otherwise it won't match new-line characters). 点( . )将匹配任何符号,至少如果您将它与Pattern.DOTALL标志一起使用(否则它将不匹配换行符)。 From the docs: 来自文档:

In dotall mode, the expression . 在dotall模式中,表达式。 matches any character, including a line terminator. 匹配任何字符,包括行终止符。 By default this expression does not match line terminators. 默认情况下,此表达式与行终止符不匹配。


Regarding your edit: 关于你的编辑:

I want to build a regex which will accept ANY number of whitespaces and the it must contain atleast 1 symbol (eg , . " ' $ £ etc.) or (not exclusive or) at least 1 character. 我想构建一个正则表达式,它将接受任意数量的空格,并且它必须包含至少1个符号(例如,“。$'等等)或(不是排他性的)至少1个字符。

Here is a suggestion: 这是一个建议:

\s*\S+
  • \\s* any number of whitespace characters \\s*任意数量的空白字符
  • \\S+ one or more ("at least one") non-whitespace character. \\S+一个或多个(“至少一个”)非空白字符。

In Java, a symbol is \\pS , which is not the same as punctuation characters, which are \\pP . 在Java中,符号是\\pS ,它与标点符号不同,即\\pP

I talk about this issue, plus enumerate the types for all the ASCII punctuation and symbols, here in this answer . 我在这个问题的答案中讨论了这个问题,并列举了所有ASCII标点符号符号的类型。

Patterns like [\\p{Alnum}\\s] only work on legacy dataset from the 1960s. [\\p{Alnum}\\s]只适用于20世纪60年代的遗留数据集。 To work on things with the Java native characters set, you needs something on the order of 要使用Java本机字符集处理事物,您需要大约的顺序

identifier_charclass = "[\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}[\\p{InEnclosedAlphanumerics}&&\\p{So}]]";
whitespace_charclass = "[\\u000A\\u000B\\u000C\\u000D\\u0020\\u0085\\u00A0\\u1680\\u180E\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2007\\u2008\\u2009\\u200A\\u2028\\u2029\\u202F\\u205F\\u3000]";

ident_or_white = "[" + identifier_charclass + whitespace_charclass + "]";

I'm sorry that Java makes it so difficult to work with modern dataset, but at least it is possible. 我很遗憾Java使得使用现代数据集变得如此困难,但至少它是可能的。

Just don't ask about boundaries or grapheme clusters. 只是不要问边界或字形集群。 For that, see my others posting . 为此,请看我的其他人发帖

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM