简体   繁体   中英

Get a substring from a string using regex

I have many strings in this format:

fdg.sdfg.234fdsa.dsf_1.2.5.62.xml
23432ssdfsa_sadfsd_1.2.7.6.xml
3.3.3asdf_ddd_1.2.1.doc

I would like to get only the number
from: fdg.sdfg.234fdsa.dsf_1.2.5.62.xml to get: 1.2.5.62
from: f23432ssdfsa_sadfsd_1.2.7.6.xml to get: 1.2.7.6
from: f3.3.3asdf_ddd_1.2.1.doc to get: 1.2.1
etc

This code works:

string test = "4534534ghgggg_1.1.3.4.xml";
int to = test.LastIndexOf('.');
int from = test.LastIndexOf('_') + 1;
Console.WriteLine(test.Substring(from,to - from));

But I want to know how can I do it with regex. Any ideas?

First, let's elaborate the rules ( number is not you want to get) for the match:

  • starts with '_' (not included in match)
  • contains digits and dots (dots are not duplcated).
  • no leading and no trailing dots are allowed
  • has at least one digit as well as at least one dot
  • ends with '.' (not included in match)

then implement a pattern:

 (?<=_)[0-9]+(\.[0-9]+)+(?=\.)

if the number in the question is, in fact, some kind of version you may want to restict number of its parts, eg

 (?<=_)[0-9]+(\.[0-9]+){1,3}(?=\.[^0-9])

which means that only 2 to 4 parts versions ( _d.d. , _d.dd and _d.ddd ) are accepted. Eg input _1.2.15. will be accepted (3 parts: 1 , 2 and 15 ) when _1.2.3.4.5. will be rejected (5 parts)

finally, use regular expressions:

  string source = ...
  string pattern = @"(?<=_)[0-9]+(\.[0-9]+)+(?=\.)";

  // If there are many matches, let's take the last one
  string lastMatch = Regex.Matches(pattern, source)
    .OfType<Match>()
    .Select(match => match.Value)
    .LastOrDefault();

  Console.Write(lastMatch); 

However, if format is fixed then regular expression (and Linq ) is overshoot. LastIndex + Substring is a better choice.

This code seems to work as long as the numbers you are looking for are preceded by "_".

Edited - This is the final working result

        // fdg.sdfg.234fdsa.dsf_1.2.5.62.xml 
        // 23432ssdfsa_sadfsd_1.2.7.6.xml
        // 3.3.3asdf_ddd_1.2.1.doc

        string source = "fdg.sdfg.234fdsa.dsf_1.2.5.62.xml";
         var match = Regex.Match(source, @"_[0-9]+\.[0-9]+\.[0-9]+(\.[0-9]+)*").ToString().Replace("_", "");
        Console.WriteLine(match);
        Console.ReadLine();

You already got all your answers. I have not practised for the last 6 months and have almost all forgotten. Anyway there are plenty of web sites ( look for regex tester in your favorite search engine ) that helps you with regex. I do not know if I can mention one more than the other but here are some snapshots of one example ( I am not the latest expert in regex so I hope I did not write something too wrong).

在此处输入图片说明在此处输入图片说明在此处输入图片说明在此处输入图片说明

So now you can test all the answers and advices that have been brought to you. .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM