简体   繁体   中英

Regex - catch unknown number of words in between

I have the following strings

  • 2011 Trieste MED clean/crude/crude
  • 2013 Trieste fo/crude/crude
  • 2013 Ningbo East Pacific cca/cf/ce
  • 2014 Agioi theodoroi MED cde/fo/ce

What i actually want to do is try to catch Trieste MED (first string), Trieste (second string), Ningbo east pacific (thirsd string) and agioi theodoroi med (fourth string) as one group called open port. Usually there are 1 to 4 words between the date 2013 eg and crude/crude/crude.

This is what i have tried so far https://regex101.com/r/mYevqd/1 .

But this is prone to errors because i only suppose that the words of the open port groups are separated by one or two spaces max which is wrong.If I try to place \\s* then the first letter of clean will be captured and this is wrong. Is there something better?

You can simplify your regex with this,

^(?<YearBuilt>\d{4})\s+(?<OpenPort>.*)\s+(?<LastCargos>[^ ]+)$

As your first thing in the string is an year, hence use \\d{4} and the last thing you want to group is something like this clean/crude/crude which you can capture as this [^ ]+ (anything but not space) and then middle text whose sample is like this Ningbo East Pacific can be captured with .*

Demo

Let me know if this works fine for you for other strings.

var strings = new[] {
    "2011 Trieste MED clean/crude/crude",
    "2013 Trieste fo/crude/crude",
    "2013 Ningbo East Pacific cca/cf/ce",
    "2014 Agioi theodoroi MED cde/fo/ce"
};
var pattern = @"^\d+\s+(.+)(?=\s+.*?/)";
foreach (var s in strings)
{
    var match = Regex.Match(s, pattern);
    if (match.Success)
        WriteLine(match.Groups[1].Value);
    else
        WriteLine("No matches found.");
}
/*
Output:
    Trieste MED
    Trieste
    Ningbo East Pacific
    Agioi theodoroi MED
*/

If you'll allow me...

Not every text-based problem needs a Regex thrown at it. Quite often you can just use eg Split() and some other purpose-driven statements to reach your goal. This can be much easier to do (and to read 6 months later) than trying to beat a sometimes unreadable Regex into submission.

Here's how:

public static void Main()
{
    var strings = new[] {"2011 Trieste MED clean/crude/crude",
                         "2013 Trieste fo/crude/crude",
                         "2013 Ningbo East Pacific cca/cf/ce",
                         "2014 Agioi theodoroi MED cde/fo/ce"};

    foreach (var s in strings)
        Console.WriteLine(GetName(s));
}

public static string GetName(string s)
{
    var allWords = s.Split(' ');
    var nameWords = allWords.Skip(1).Take(allWords.Length - 2);
    return string.Join(" ", nameWords);
}

Skip() and Take() are Linq extension methods, available after adding using System.Linq; to the C# file.

See it running: https://dotnetfiddle.net/FTBcfC

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM