简体   繁体   English

RegEx Match可以以不同的单词结束

[英]RegEx Match can end with different words

I want to match a few lines, but they can end different. 我想匹配几行,但它们可以结束不同。 With "Registrar:" or "Registered:". 使用“注册商:”或“注册:”。

So I naive tried this: 所以我天真地试过这个:

Registrant's address:(\s*)(?<Value>.*).*((Registrar:)|(Registered:))

What do I wrong with this OR Operator? 这个OR运算符怎么了?

(The goal is to leach data from different tlds with RegEx directly from the WhoIs Server) (目标是直接从WhoIs服务器使用RegEx从不同的tlds中获取数据)

1. Data: 1.数据:

Domain name: argos.co.uk 域名:argos.co.uk

 Registrant: Argos Ltd Registrant type: UK Public Limited Company, (Company number: 1081551) Registrant's address: Avebury 489-499 Avebury Boulevard Central Milton Keynes Milton Keynes MK9 2NW United Kingdom Registered through: NetNames Limited URL: http://www.netnames.co.uk 

I want this: 我要这个:

  Avebury 489-499 Avebury Boulevard Central Milton Keynes Milton Keynes MK9 2NW United Kingdom 

2. Data: 2.数据:

 Domain name: amazon.co.uk Registrant: Amazon Europe Holding Technologies SCS Registrant type: Unknown Registrant's address: 65 boulevard GD. Charlotte Luxembourg City Luxembourg LU-1311 Luxembourg Registrar: Amazon.com [Tag = AMAZON-COM] URL: http://www.amazon.com Relevant dates: Registered on: before Aug-1996 Expiry date: 05-Dec-2020 Last updated: 23-Oct-2013 

I want this: 我要这个:

  65 boulevard GD. Charlotte Luxembourg City Luxembourg LU-1311 Luxembourg 

It seems that you don't need regex here 看来你这里不需要正则表达式

var result = String.Join(Environment.NewLine, File.ReadLines(filename)
                .SkipWhile(x => !x.StartsWith("Registrant's address:"))
                .Skip(1)
                .TakeWhile(x => !String.IsNullOrEmpty(x)));

The : in your regex simply isn't present in your text and you'll need to specify RegexOptions.Singleline if you haven't already to allow . :你的正则表达式中根本不存在你的文本中,如果你还没有允许的话,你需要指定RegexOptions.Singleline . to match new lines. 匹配新线。

Registrant's address:(\s*)(?<Value>.*).*((Registrar)|(Registered))

You have many capture groups that probably are unnecessary. 您有许多可能不必要的捕获组。

Registrant's address:\s*(?<Value>.*).*Regist(?:rar|ered)

Another note, you might hit some problem with greedy matching if you have consecutive records in the text you are trying to match. 另请注意,如果您在尝试匹配的文本中有连续记录,则可能会遇到贪婪匹配的问题。 Adding a few ? 加一些? will solve the problem: 将解决问题:

Registrant's address:\s*(?<Value>.*?).*?Regist(?:rar|ered)

You could use the following regex to match only the data you need without capturing unnecessary data. 您可以使用以下正则表达式仅匹配所需的数据,而无需捕获不必要的数据。

Using lookaround assertions: 使用环绕声断言:

(?<=Registrant's address:).*(?=(?:Registrar:|Registered:))

Working example: 工作范例:

http://regex101.com/r/cN5wP3 http://regex101.com/r/cN5wP3

just make sure to use RegexOptions.Singleline . 只需确保使用RegexOptions.Singleline

EDIT: 编辑:

And to capture the match in the named group value you would have this: 要捕获命名组value的匹配,您将拥有:

(?<=Registrant's address:)(?<value>.*)(?=(?:Registrar:|Registered:))

Example: 例:

http://regex101.com/r/fY3oR9 http://regex101.com/r/fY3oR9

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM