简体   繁体   English

使用C#查找并替换字符串中的多个Instagram urls

[英]Find & Replace Multiple Instagram Urls In A String Using C#

I want to find all the instagram urls within a string, and replace them with the embed url. 我想在字符串中找到所有instagram网址,然后将其替换为嵌入网址。

But I'm keen on performance, as this could be 5 to 20 posts each anything up to 6000 characters with an unknown amount of instagram urls in which need converting. 但是我很热衷于性能,因为这可能是5到20个帖子,每个帖子最多包含6000个字符,其中包含未知数量的instagram url,需要在其中进行转换。

Url examples ( Could be any of these in each string, so would need to match all ) 网址示例( 每个字符串中可以是其中的任何一个,因此需要匹配所有

http://instagram.com/p/xPnQ1ZIY2W/?modal=true
http://instagram.com/p/xPnQ1ZIY2W/
http://instagr.am/p/xPnQ1ZIY2W/

And this is what I need to replace them with (An embedded version) 这就是我需要将其替换为(嵌入式版本)的内容

<img src="http://instagram.com/p/xPnQ1ZIY2W/media/?size=l" class="instagramimage" />

I was thinking about going for regex? 我在考虑使用正则表达式吗? But is this the quickest and most performant way of doing this? 但这是最快,最高效的方法吗?

Any examples greatly appreciated. 任何例子都非常感谢。

Something like: 就像是:

Regex reg = new Regex(@"http://instagr\.?am(?:\.com)?/\S*");

Edited regex. 编辑的正则表达式。 However i would combine this with a stringreader and do it line by line. 但是,我会将其与StringReader结合起来并逐行进行。 Then put the string (modified or not) into a stringbuilder: 然后将字符串(是否修改)放入stringbuilder中:

string original = @"someotherText http://instagram.com/p/xPnQ1ZIY2W/?modal=true some other text
some other text http://instagram.com/p/xPnQ1ZIY2W/ some other text
some other text http://instagr.am/p/xPnQ1ZIY2W/ some other text";

StringBuilder result = new StringBuilder();

using (StringReader reader = new StringReader(original))
{
    while (reader.Peek() > 0)
    {
        string line = reader.ReadLine();
        if (reg.IsMatch(line))
        {
            string url = reg.Match(line).ToString();
            result.AppendLine(reg.Replace(line,string.Format("<img src=\"{0}\" class=\"instagramimage\" />",url)));
        }
        else
        {
            result.AppendLine(line);
        }
   }
}

Console.WriteLine(result.ToString());

You mean like this? 你的意思是这样吗?

class Program
{
    private static Regex reg = new Regex(@"http://instagr\.?am(?:\.com)?/\S*", RegexOptions.Compiled);
    private static Regex idRegex = new Regex(@"(?<=p/).*?(?=/)",RegexOptions.Compiled);

    static void Main(string[] args)
    {
        string original = @"someotherText  http://instagram.com/p/xPnQ1ZIY2W/?modal=true some other text
some other text http://instagram.com/p/xPnQ1ZIY2W/ some other text
some other text http://instagr.am/p/xPnQ1ZIY2W/ some other text";

        StringBuilder result = new StringBuilder();

        using (StringReader reader = new StringReader(original))
        {
            while (reader.Peek() > 0)
            {
                string line = reader.ReadLine();
                if (reg.IsMatch(line))
                {
                    string url = reg.Match(line).ToString();
                    result.AppendLine(reg.Replace(line, string.Format("<img src=\"http://instagram.com/p/{0}/media/?size=1\" class=\"instagramimage\" />", idRegex.Match(url).ToString())));
                }
                else
                {
                    result.AppendLine(line);
                }

            }
        }

        Console.WriteLine(result.ToString());



    }
}

A well-crafted and compiled regular expression is hard to beat, especially since you're doing replacements, not just searching, but you should test to be sure. 精心设计和编译的正则表达式难以抗拒,尤其是因为您正在进行替换,而不仅仅是搜索,还应该进行测试以确保确定。

If the Instagram URLs are only within HTML attributes, here's my first stab at a pattern to look for: 如果 Instagram网址在HTML属性内,这是我的第一个刺探模式:

(?<=")(https?://instagr[^">]+)

(I added a check for https as well, which you didn't mention but I believe is supported by Instagram.) (我还为https添加了一个支票,您没有提到,但我相信它受Instagram支持。)

Some false positives are theoretically possible, but it will perform better than pedantically matching every legal variation of an Instagram URL. 从理论上讲可能会出现一些误报,但比起将Peer URL的每个合法变体进行合理匹配,其效果要好得多。 (The ">" check is just in case the HTML is missing the end quote for some reason.) (“>”检查是为了防止HTML由于某种原因而缺少结尾引号。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM