简体   繁体   中英

Trying to take a url out of a long string

I have a long string of text that I've isolated from a sql table and turned into a string;

Thank you for your request.   
Please click the following link to reset your password:
http://localhost:5692/Public/LogonSetPassword.aspx?activationLinkId=603fa657-9460-4417-adc2-7bcad0416c3e
If clicking on the link does not work then please copy and paste it directly into your browser address bar

I'm now trying to just take out just the url and put that into another string. I need to grab it from "http" and end it using the space right after the unique id.

I've tried:

string activationUrl = sql.Substring(sql.IndexOf("http", sql.IndexOf(" ")));

However it doesn't seem to work. Can someone explain where I'm going wrong please? Thank you.

URL's can contain many characters, but they cannot contain whitespace, so you may be more successful using regex.

A simple pattern would say "Starts with http, followed by more than 1 non-whitespace character"

var regex = new Regex(@"http[^\s]+");
Console.WriteLine(regex.Match(sql));

Live example: https://rextester.com/BOV71354

In your attempt, sql.IndexOf(" ") will match the first occurrence of a space, in your example it's at index 5 (in Thank you ).

You have to look to the first occurrence of a new line after the first occurrence of http :

var startIndex = sql.IndexOf("http", StringComparison.Ordinal);
var endIndex = sql.IndexOf('\r', startIndex); // maybe '\n' or ' '

And Substring second argument is a length not an index, the correct code is:

var url = sql.Substring(startIndex, endIndex - startIndex - 1);

But the cleanest way to do this is to use regexp :

// Assuming there is only one url and it fit alone on a single line.
var regex = new Regex(@"^http.*\r?$", RegexOptions.Multiline);
var match = regex.Match(s);
if (match.Success)
{
    var url = match.Value;
}

this solution assume that there will be only one url

var indexOfHttp = sql.IndexOf("http");
var strStartingFromHttp = sql.Substring(indexOfHttp);
var activationUrl = strStartingFromHttp.Substring(0 , strStartingFromHttp.IndexOf('\n'));

https://dotnetfiddle.net/tnUTPk

I am not exactly sure what you mean by just the url. The code you are using is going from the first instance found of "http" in the entire text, then going to the first index of a " " in the string. The first instance of "http" is on the third line, the first instance of " " is on the first line right after "Thank".

If the url is always going to be on a separate line, and you will only have one url, you can simply split the string by new line and check if that line starts with http:

string url = null;
foreach (string line in sql.Split('\n'))
{
    if (line.ToLower().StartsWith("http"))
    {
        url = line;
        break;
    }
}
if (url != null) //You found a url

In this case "url" will be " http://localhost:5692/Public/LogonSetPassword.aspx?activationLinkId=603fa657-9460-4417-adc2-7bcad0416c3e "

var regex = new Regex(@"https?://(www.)?[-a-zA-Z0-9@:%._+~#=]{1,256}.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_+.~#?&//=]*)", RegexOptions.Compiled);
var activationUrl = regex.Match(sql)?.Value;

https://dotnetfiddle.net/Cz16QR

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM