[英]Java regex replacing all the characters except few combination of characters
Input String : NNULL(EUR,VALUE)+SOMESTR
输入字符串:
NNULL(EUR,VALUE)+SOMESTR
Expected output: NNULL(X,X)+X
预期输出:
NNULL(X,X)+X
Expression tried: 尝试表达:
String str = "NNULL(EUR,VALUE)+SOMESTR";
str=str.replaceAll("[^(NNULL)\\+,]+","X");
Output getting: 输出得到:
NNULL(XUX,XLUX)+X
How do you imagine the output you want would be produced? 您如何想象将要产生的输出?
The [...]
syntax is a character class. 在
[...]
语法是一个字符类。 It matches individual characters. 它匹配单个字符。 Negating the class matches any character not in the class.
否定类会匹配类中没有的任何字符。 So
[^(NNULL)\\\\+,]
matches every character that isn't one of (
, N
, U
, etc. 因此
[^(NNULL)\\\\+,]
匹配每个不是(
, N
, U
等的字符。
It seems like what you want to do is tokenize the string according to a set of rules that you haven't clearly defined, and then replace certain tokens. 似乎您想要执行的操作是根据尚未明确定义的一组规则对字符串进行标记,然后替换某些标记。
First, define a simple regex that will match a single 'token'. 首先,定义一个将与单个“令牌”匹配的简单正则表达式。 From your question, I'm guessing you want to consider words and symbols, so the tokens would be
NNULL
, (
, EUR
, ,
, VALUE
, )
, +
, X
. 从你的问题,我猜你要考虑的文字和符号,所以令牌将
NNULL
, (
, EUR
, ,
, VALUE
, )
, +
, X
。
Pattern pattern = Pattern.compile("\\w+|\\W"); // a word, or a single non-word character
Now, find a way to specify which tokens to keep and which to replace. 现在,找到一种方法来指定要保留的令牌和要替换的令牌。 I used a Set containing the 'good' tokens, but any string predicate will work.
我使用了一个包含“好”标记的Set,但是任何字符串谓词都可以使用。
Set<String> retain = new HashSet<>(Arrays.asList("NNULL", "(", ")", ",", "+"));
All we have to do now is loop through the tokens (as identified by the regex) and see if they're in the 'good' set or not. 现在我们要做的就是遍历令牌(由正则表达式标识),并查看它们是否处于“良好”状态。
StringBuilder result = new StringBuilder();
Matcher m = pattern.matcher(input);
while(m.find()) {
String token = m.group();
result.append(retain.contains(token) ? token : "X");
}
Some people, when confronted with a problem, think "I know, I'll use regular expressions." 有些人在遇到问题时会认为“我知道,我会使用正则表达式”。 Now they have two problems.
现在他们有两个问题。
Your suggested pattern [^NNULL]
does not mean anything but NNULL
, it means anything not in the character class NNULL
with N
being duplicate here. 您建议的模式
[^NNULL]
除了NNULL
没有其他NNULL
,它表示字符类NNULL
中没有的任何NNULL
,此处N
重复。
Use this pattern instead 改用此模式
\b(?!NNULL)[^(),+]+
\b # <word boundary>
(?! # Negative Look-Ahead
NNULL # "NNULL"
) # End of Negative Look-Ahead
[^(),+] # Character not in [(),+] Character Class
+ # (one or more)(greedy)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.