RegEx Match可以以不同的单词结束

Question

I want to match a few lines, but they can end different. 我想匹配几行，但它们可以结束不同。 With "Registrar:" or "Registered:". 使用“注册商：”或“注册：”。

So I naive tried this: 所以我天真地试过这个：

Registrant's address:(\s*)(?<Value>.*).*((Registrar:)|(Registered:))

What do I wrong with this OR Operator? 这个OR运算符怎么了？

(The goal is to leach data from different tlds with RegEx directly from the WhoIs Server) （目标是直接从WhoIs服务器使用RegEx从不同的tlds中获取数据）

1. Data: 1.数据：

Domain name: argos.co.uk 域名：argos.co.uk

 Registrant: Argos Ltd Registrant type: UK Public Limited Company, (Company number: 1081551) Registrant's address: Avebury 489-499 Avebury Boulevard Central Milton Keynes Milton Keynes MK9 2NW United Kingdom Registered through: NetNames Limited URL: http://www.netnames.co.uk

I want this: 我要这个：

  Avebury 489-499 Avebury Boulevard Central Milton Keynes Milton Keynes MK9 2NW United Kingdom

2. Data: 2.数据：

 Domain name: amazon.co.uk Registrant: Amazon Europe Holding Technologies SCS Registrant type: Unknown Registrant's address: 65 boulevard GD. Charlotte Luxembourg City Luxembourg LU-1311 Luxembourg Registrar: Amazon.com [Tag = AMAZON-COM] URL: http://www.amazon.com Relevant dates: Registered on: before Aug-1996 Expiry date: 05-Dec-2020 Last updated: 23-Oct-2013

I want this: 我要这个：

  65 boulevard GD. Charlotte Luxembourg City Luxembourg LU-1311 Luxembourg

Answer 1

It seems that you don't need regex here 看来你这里不需要正则表达式

var result = String.Join(Environment.NewLine, File.ReadLines(filename)
                .SkipWhile(x => !x.StartsWith("Registrant's address:"))
                .Skip(1)
                .TakeWhile(x => !String.IsNullOrEmpty(x)));

Answer 2

The : in your regex simply isn't present in your text and you'll need to specify RegexOptions.Singleline if you haven't already to allow . :你的正则表达式中根本不存在你的文本中，如果你还没有允许的话，你需要指定RegexOptions.Singleline . to match new lines. 匹配新线。

Registrant's address:(\s*)(?<Value>.*).*((Registrar)|(Registered))

You have many capture groups that probably are unnecessary. 您有许多可能不必要的捕获组。

Registrant's address:\s*(?<Value>.*).*Regist(?:rar|ered)

Another note, you might hit some problem with greedy matching if you have consecutive records in the text you are trying to match. 另请注意，如果您在尝试匹配的文本中有连续记录，则可能会遇到贪婪匹配的问题。 Adding a few ? 加一些? will solve the problem: 将解决问题：

Registrant's address:\s*(?<Value>.*?).*?Regist(?:rar|ered)

Answer 3

You could use the following regex to match only the data you need without capturing unnecessary data. 您可以使用以下正则表达式仅匹配所需的数据，而无需捕获不必要的数据。

Using lookaround assertions: 使用环绕声断言：

(?<=Registrant's address:).*(?=(?:Registrar:|Registered:))

Working example: 工作范例：

http://regex101.com/r/cN5wP3 http://regex101.com/r/cN5wP3

just make sure to use RegexOptions.Singleline . 只需确保使用RegexOptions.Singleline 。

EDIT: 编辑：

And to capture the match in the named group value you would have this: 要捕获命名组value的匹配，您将拥有：

(?<=Registrant's address:)(?<value>.*)(?=(?:Registrar:|Registered:))

Example: 例：

http://regex101.com/r/fY3oR9 http://regex101.com/r/fY3oR9

RegEx Match可以以不同的单词结束

问题描述

3 个解决方案

解决方案1
3 2014-01-16 19:55:14

解决方案2
2 2014-01-16 19:55:22

解决方案3
1 已采纳 2014-01-16 20:03:31

RegEx Match可以以不同的单词结束

问题描述

3 个解决方案

解决方案1 3 2014-01-16 19:55:14

解决方案2 2 2014-01-16 19:55:22

解决方案3 1 已采纳 2014-01-16 20:03:31

解决方案1
3 2014-01-16 19:55:14

解决方案2
2 2014-01-16 19:55:22

解决方案3
1 已采纳 2014-01-16 20:03:31