简体   繁体   English

如何使用正则表达式在Java中查找不包含特定字符串的字符串

[英]How to find a string that does not contain a specific string in Java using regex

I am trying to filter out those strings in a list of strings which do not contain a specific substring (which is nothing but today's date in the YYYY-MM-DD format in my case), but unable to do so.我试图在不包含特定子字符串的字符串列表中过滤掉那些字符串(在我的情况下,它只是今天的 YYYY-MM-DD 格式的日期),但无法这样做。

This is what I have tried so far.这是我迄今为止尝试过的。

My String is in this format - ABC-TEST.20181206.20181208- 20180215 .log我的字符串采用这种格式 - ABC- TEST.20181206.20181208-20180215 .log

The string may also be in this format - ABC-TEST.20181206.20181208- 20180215 -1.log (the 1 before .log may go on up to infinite)该字符串也可能采用这种格式 - ABC- TEST.20181206.20181208-20180215 -1.log(.log 之前的 1 可能一直到无限大)

If I see that the string is ending with today's date at the end (before .log or -.log), I need to find it.如果我看到字符串以今天的日期结尾(在 .log 或 -.log 之前),我需要找到它。

So, since it is not ending with today's date, I need to filter this out.所以,由于它不是以今天的日期结束,我需要过滤掉它。

I have tried this pattern to identify the file which contains today's date, but I am unable to find that string which does not contain it我已经尝试过这种模式来识别包含今天日期的文件,但我无法找到不包含它的字符串

(.*?)-20180221-?(\\d+)?.log

This is one of the patterns I have tried without luck这是我尝试过但没有运气的模式之一

(.*?)-^((?!20180221))-?(\\d+)?.log

If you are willing to accept some reasonable limit on the number of digits, then you could use a basic negative lookbehind, like this:如果您愿意接受一些合理的数字数量限制,那么您可以使用基本的否定回顾,如下所示:

String pattern = ".*(?<!20180215(-[0-9]{1,7})?\\.log)$";
String false1 = "ABC-TEST.20181206.20181208-20180215.log";
String false2 = "ABC-TEST.20181206.20181208-20180215-1.log";
String true1 = "ABC-TEST.20181206.20181208-20180216.log";
String true2 = "ABC-TEST.20181206.20181208-20180216-1.log";

System.out.println(false1.matches(pattern)); // false
System.out.println(false2.matches(pattern)); // false
System.out.println(true1.matches(pattern)); // true
System.out.println(true2.matches(pattern)); // true

I would like to point out that if the trailing counter is allowed to have more than 7 digits then it creates an ambiguous situation based on the algorithm you have outlined, because at that point there is no way to distinguish between the counter 20180215 and the date 20180215.我想指出的是,如果允许尾随计数器超过 7 位数字,那么它会根据您概述的算法创建一个模棱两可的情况,因为此时无法区分计数器 20180215 和日期20180215。


Question Evolution #1问题演变#1

The question writer has used the comment section on this answer to change his requirements as follows:问题作者已使用此答案的评论部分更改了他的要求如下:

the string should contain "TEST"字符串应包含“TEST”

Answer回答

You would just add .*TEST to the front of the pattern in this answer.在此答案中,您只需将.*TEST添加到模式的前面。 Like so:像这样:

String pattern = ".*TEST.*(?<!20180215(-[0-9]{1,7})?\\.log)$";

Question Evolution #2问题演变#2

The question writer has used the comment section on this answer to change his requirements as follows:问题作者已使用此答案的评论部分更改了他的要求如下:

to pick up the String which does NOT contain TEST and which does NOT contain today's date选择不包含 TEST 且不包含今天日期的字符串

Answer回答

You could use a negative lookahead of "TEST" for every repetition of the initial wildcard, like so:您可以对初始通配符的每次重复使用“TEST”的负前瞻,如下所示:

String pattern = "((?!TEST).)*(?<!20180215(-[0-9]{1,7})?\\.log)";

Well, in my timezone, it is still the 14th of february, so I used:好吧,在我的时区,它仍然是 2 月 14 日,所以我使用了:

egrep -- "-$(date "+%Y%m%d" -d now+1day )-?.log" sample
My String is in this format - ABC-TEST.20181206.20181208-20180215.log

Well - you're using Java?嗯 - 你在使用 Java? Why not?为什么不?

-> import java.util.Date
-> Date d = new Date()
-> String today = String.format ("%tY%tm%td", d, d, d)
-> String s1 = "My String is in this format - ABC-TEST.20181206.20181208-20180214.log"
-> String s2 = "The string may also be in this format - ABC-TEST.20181206.20181208-20180214-1.log (the 1 before .log may go on up to infinite)"
-> String pattern = ".*" + today + "\\.log"
-> s1.matches (pattern) 
|  Expression value is: true
|    assigned to temporary variable $39 of type boolean
-> s2.matches (pattern) 
|  Expression value is: false
|    assigned to temporary variable $40 of type boolean

That's copied from the jshell, a fine tool, for fast ad hoc testing.这是从 jshell 复制的,这是一个很好的工具,用于快速临时测试。

 (.*?)-20180221-?(\\d+)?.log

Well compared with today - be it the 14th or the 15th - we don't want to be to strict about it, but 02/21 is pretty off, isn't it?与今天相比——无论是 14 日还是 15 日——我们不想对它过于严格,但是 02/21 已经很晚了,不是吗?

What is the first question mark supposed to do?第一个问号应该做什么?

Can the fine line be细线可以吗

   -20180221.log
   -20180221-.log
   -20180221888.log
   -20180221-888.log

? ? If lines don't match - do you still need to find those, which contain如果行不匹配 - 您是否仍然需要找到那些包含

or isn't it either或者不是

   -20180221.log  

or -20180221-888.log或 -20180221-888.log

Then:然后:

   String pattern = ".*" + today + "(-[0-9]+)?\\.log";

If there might be something behind log:如果日志后面可能有什么东西:

   String pattern = ".*" + today + "(-[0-9]+)?\\.log.*";

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM