简体   繁体   English

在Java Regex中匹配A而不是B?

[英]Matches A but not B in Java Regex?

I have a big document. 我有一份大文件。 Lets scale it to 让它扩展到

location=State-City-House
location=City-House

So What I want to do is replace all those not starting with State, with some other string. 所以我想做的是用一些其他字符串替换所有那些不以State开头的东西。 Say "NY". 说“纽约”。 But those starting with State must remain untouched. 但那些从国家开始的人必须保持不变。

So my end result would be 所以我的最终结果是

location=State-City-House
location=NY-City-House

1.Obviously I cant use String.replaceAll(). 1.显然我不能使用String.replaceAll()。

2.Using Pattern.matcher() is tricky since we are using two different patterns where one must be found and one must not be found. 2.使用Pattern.matcher()很棘手,因为我们使用两种不同的模式,其中一个必须找到,一个不能找到。

3.Tried a dirty way of replacing "location=State" first with "bocation=State" then replacing the others and then re-replacing. 3.首先用“bocation = State”替换“location = State”然后替换其他的然后重新替换。

So, A neat and simple way to do it? 那么,一个简洁明了的方法呢?

You can definitely use replaceAll with a negative lookahead: 你肯定可以将replaceAll用于负向前瞻:

String repl = input.replaceAll( "(?m)^(location=)(?!State)", "$1NY-" );
  • (?m) sets MULTILINE modifier so that we match anchors ^ and $ in each line (?m)设置MULTILINE修饰符,以便我们匹配每行中的锚点^$
  • (location=) matches location= and captures the value in group #1 (location=)匹配location=并捕获组#1中的值
  • (?!State) is the negative lookahead to fail the match when State appears after the captured group #1 ie location= (?!State)State出现在捕获的组#1即location=之后失败匹配的负前瞻
  • In replacement we use $1NY- to make it location=NY- at start. 在替换时,我们使用$1NY-在开始时使其location=NY-

RegEx Demo RegEx演示

If I understand your intention correctly, you don't actually have the string "State" in your input, but varying strings that represent states. 如果我理解你的意图正确,你的输入中实际上没有字符串“State”,而是表示状态的不同字符串。 But some of your text lines are missing the state altogether and only have the name of the City and House. 但是你的一些文本行完全错过了州,只有城市和众议院的名字。 Is that correct? 那是对的吗? In that case, the defining characteristic between the 2 kinds of lines is the number of dashes. 在这种情况下,两种线之间的定义特征是破折号。

^location=([^-]+)-([^-]+)$

The above regex matches only full lines with only 1 dash. 上面的正则表达式只匹配只有1个破折号的实线。

I might have misunderstood the task. 我可能误解了这个任务。 It would be easier if you would post some of the actual input. 如果您发布一些实际输入会更容易。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM