简体   繁体   English

试图从一个长字符串中取出一个 url

[英]Trying to take a url out of a long string

I have a long string of text that I've isolated from a sql table and turned into a string;我有一长串文本,我从一个sql表中分离出来并变成了一个字符串;

Thank you for your request.   
Please click the following link to reset your password:
http://localhost:5692/Public/LogonSetPassword.aspx?activationLinkId=603fa657-9460-4417-adc2-7bcad0416c3e
If clicking on the link does not work then please copy and paste it directly into your browser address bar

I'm now trying to just take out just the url and put that into another string.我现在正试图只取出 url 并将其放入另一个字符串中。 I need to grab it from "http" and end it using the space right after the unique id.我需要从“http”中获取它并使用唯一 ID 之后的空格结束它。

I've tried:我试过了:

string activationUrl = sql.Substring(sql.IndexOf("http", sql.IndexOf(" ")));

However it doesn't seem to work.但是它似乎不起作用。 Can someone explain where I'm going wrong please?有人可以解释我哪里出错了吗? Thank you.谢谢你。

URL's can contain many characters, but they cannot contain whitespace, so you may be more successful using regex. URL 可以包含许多字符,但不能包含空格,因此使用正则表达式可能会更成功。

A simple pattern would say "Starts with http, followed by more than 1 non-whitespace character"一个简单的模式会说“以 http 开头,后跟超过 1 个非空白字符”

var regex = new Regex(@"http[^\s]+");
Console.WriteLine(regex.Match(sql));

Live example: https://rextester.com/BOV71354现场示例: https ://rextester.com/BOV71354

In your attempt, sql.IndexOf(" ") will match the first occurrence of a space, in your example it's at index 5 (in Thank you ).在您的尝试中, sql.IndexOf(" ")将匹配第一次出现的空格,在您的示例中它位于索引 5(在Thank you中)。

You have to look to the first occurrence of a new line after the first occurrence of http :您必须在第一次出现http之后查看第一次出现的换行符:

var startIndex = sql.IndexOf("http", StringComparison.Ordinal);
var endIndex = sql.IndexOf('\r', startIndex); // maybe '\n' or ' '

And Substring second argument is a length not an index, the correct code is:Substring第二个参数是长度而不是索引,正确的代码是:

var url = sql.Substring(startIndex, endIndex - startIndex - 1);

But the cleanest way to do this is to use regexp :但最干净的方法是使用regexp

// Assuming there is only one url and it fit alone on a single line.
var regex = new Regex(@"^http.*\r?$", RegexOptions.Multiline);
var match = regex.Match(s);
if (match.Success)
{
    var url = match.Value;
}

this solution assume that there will be only one url此解决方案假设只有一个 url

var indexOfHttp = sql.IndexOf("http");
var strStartingFromHttp = sql.Substring(indexOfHttp);
var activationUrl = strStartingFromHttp.Substring(0 , strStartingFromHttp.IndexOf('\n'));

https://dotnetfiddle.net/tnUTPk https://dotnetfiddle.net/tnUTPk

I am not exactly sure what you mean by just the url.我不太确定你所说的 url 是什么意思。 The code you are using is going from the first instance found of "http" in the entire text, then going to the first index of a " " in the string.您使用的代码是从整个文本中找到的第一个“http”实例开始,然后转到字符串中“”的第一个索引。 The first instance of "http" is on the third line, the first instance of " " is on the first line right after "Thank". “http”的第一个实例在第三行,“”的第一个实例在“谢谢”之后的第一行。

If the url is always going to be on a separate line, and you will only have one url, you can simply split the string by new line and check if that line starts with http:如果 url 总是在单独的一行上,而你只有一个 url,你可以简单地用新行拆分字符串并检查该行是否以 http 开头:

string url = null;
foreach (string line in sql.Split('\n'))
{
    if (line.ToLower().StartsWith("http"))
    {
        url = line;
        break;
    }
}
if (url != null) //You found a url

In this case "url" will be " http://localhost:5692/Public/LogonSetPassword.aspx?activationLinkId=603fa657-9460-4417-adc2-7bcad0416c3e "在这种情况下,“url”将是“ http://localhost:5692/Public/LogonSetPassword.aspx?activationLinkId=603fa657-9460-4417-adc2-7bcad0416c3e

var regex = new Regex(@"https?://(www.)?[-a-zA-Z0-9@:%._+~#=]{1,256}.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_+.~#?&//=]*)", RegexOptions.Compiled);
var activationUrl = regex.Match(sql)?.Value;

https://dotnetfiddle.net/Cz16QR https://dotnetfiddle.net/Cz16QR

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM