简体   繁体   中英

Extracting a string starting with x and ending with y

First of all, I did a search on this and was able to find how to use something like String.Split() to extract the string based on a condition. I wasn't able to find however, how to extract it based on an ending condition as well. For example, I have a file with links to images: http://i594.photobucket.com/albums/tt27/34/444.jpghttp://i594.photobucket.com/albums/as/asfd/ghjk6.jpg You will notice that all the images start with http:// and end with .jpg . However, .jpg is succeeded by http:// without a space, making this a little more difficult.

So basically I'm trying to find a way (Regex?) to extract a string from a string that starts with http:// and ends with .jpg

Regex is the easiest way to do this. If you're not familiar with regular expressions, you might check out Regex Buddy . It's a relatively cheap little tool that I found extremely useful when I was learning. For your particular case, a possible expression is:

(http://.+?\.jpg)

It probably requires some more refinement, as there are boundary cases that could trip this up, but it would work if the file is a simple list.


You can also do free quick testing of expressions here .


Per your latest comment, if you have links to other non-images as well, then you need to make sure it doesn't start at the http:// for one link and read all the way to the .jpg for the next image. Since URLs are not allowed to have whitespace, you can do it like this:

(http://[^\s]+\.jpg)

This basically says, "match a string starting with http:// and ending with .jpg where there is at least one character between the two and none of those characters are whitespace".

    Regex RegexObj = new Regex("http://.+?\\.jpg");
Match MatchResults = RegexObj.Match(subject);
while (MatchResults.Success) {
    //Do something with it 
    MatchResults = MatchResults.NextMatch();
     }

In your specific case, you could always split if by ".jpg". You will probably end up with one empty element at the end of the array, and have to append the .jpg at the end of each file if you need that. Apart from that I think it would work.

Tested the following code and it worked fine:

public void SplitTest()
{
    string test = "http://i594.photobucket.com/albums/tt27/34/444.jpghttp://i594.photobucket.com/albums/as/asfd/ghjk6.jpg";
    string[] items = test.Split(new string[] { ".jpg" }, StringSplitOptions.RemoveEmptyEntries);
}

It even get rid of the empty entry...

The following LINQ will separate by http: and make sure to only get values that end with jpg.

 var images = from i in imageList.Split(new[] {"http:"}, 
                                     StringSplitOptions.RemoveEmptyEntries)
              where i.EndsWith(".jpg")
              select "http:" + i;

Regex would work really well for this. Here's an example in C# (and Java) for Regex

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM