简体   繁体   English

Java正则表达式:需要一个正则表达式来匹配指定的所有格式

[英]Java regex: need one regex to match all the formats specified

A log file has these pattern appearing more than once in a line. 日志文件使这些模式在一行中出现多次。 for example the file may look like 例如,文件可能看起来像

dsads utc-hour_of_year:2013-07-30T17 jdshkdsjhf utc-week_of_year:2013-W31 dskjdskf
utc-week_of_year:2013-W31 dskdsld  fdsfd
dshdskhkds utc-month_of_year:2013-07 gfdkjlkdf

I want to replace all date specific info with "Y" 我想用“Y”替换所有日期特定信息

I tried : replaceAll("_year:.*\\s", "_year:Y ");` but it removes everything that occurs after the first replacement,due to greedy match of .* 我试过:replaceAll(“_ year:。* \\ s”,“_ year:Y”);`但它会删除第一次替换后发生的所有事情,因为贪婪匹配。*

dsads utc-hour_of_year:Y
utc-week_of_year:Y
dshdskhkds utc-month_of_year:Y

but the expected result is: 但预期的结果是:

dsads utc-hour_of_year:Y jdshkdsjhf utc-week_of_year:Y dskjdskf
utc-week_of_year:Y dskdsld  fdsfd
dshdskhkds utc-month_of_year:Y gfdkjlkdf

Try using a reluctant quantifier: _year:.*?\\s . 尝试使用不情愿的量词: _year:.*?\\s

.replaceAll("_year:.*?\\s", "_year:Y ")

System.out
        .println("utc-hour_of_year:2013-07-30T17 dsfsdgfsgf utc-week_of_year:2013-W31 dsfsdgfsdgf"
                .replaceAll("_year:.*?\\s", "_year:Y "));
utc-hour_of_year:Y dsfsdgfsgf utc-week_of_year:Y dsfsdgfsdgf

I am not sure what you are really trying to do and this answer is only based on your example. 我不确定你真正想做什么,这个答案只是基于你的例子。 In case you want to do something else leave comment below or edit your question with more specific information/example 如果您想要做其他事情,请在下方留言或使用更具体的信息/示例编辑您的问题

It removes everything after _year: because you are using .*\\\\s which means 它会删除_year:之后的所有_year:因为你正在使用.*\\\\s这意味着

  • .* zero or more of any characters (beside new line), .*零个或多个任何字符(在新行旁边),
  • \\\\s and space after it 它之后\\\\s空间和空间

so in sentence 所以句子

utc-hour_of_year:2013-07-30T17 dsfsdgfsgf utc-week_of_year:2013-W31 dsfsdgfsdgf

it will match 它会匹配

utc-hour_of_year:2013-07-30T17 dsfsdgfsgf utc-week_of_year:2013-W31 dsfsdgfsdgf
//               ^from here                                to here^

because by default * quantifier is greedy . 因为默认情况下* 量词是贪心的 To make it reluctant you need to add ? 为了使它不情愿你需要添加? after * so try maybe 之后*所以尝试也许吧

  • "_year:.*?\\\\s"

or even better instead .*? 甚至更好.*? match only non-space characters using \\\\S which is the same as negation of \\\\s that can be written as [^\\\\s] . 仅使用\\\\S匹配非空格字符,这与可以写为[^\\\\s]\\\\s否定相同。 Also if your data can be at the end of your input you shouldn't probably add \\\\s at the end of your regex and space in your replacement, so try maybe one of this ways 此外,如果您的数据可以在输入结束时,您可能不应该在替换的正则表达式和空格的末尾添加\\\\s ,所以尝试可能是这种方式之一

  • .replaceAll("_year:\\\\S*", "_year:Y")
  • .replaceAll("_year:\\\\S*\\\\s", "_year:Y ")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM