简体   繁体   English

C#,使用正则表达式或字符串拆分提取字符串

[英]C#, extracting strings using regex or string splitting

After reading the answers from this question: C# regex pattern to extract urls from given string - not full html urls but bare links as well I want to know which would be the fastest way to extract urls from a document, by using regex matching or by using string split method. 在阅读了以下问题的答案后: C#正则表达式模式从给定的字符串中提取url-不是完整的html url,而是裸露的链接,我想知道哪种方法是通过使用正则表达式匹配或从文档中提取url的最快方法。使用字符串拆分方法。

So, you have a string containing an html document and want to extract urls. 因此,您有一个包含html文档的字符串,并且要提取url。

The regex way would be: 正则表达式的方式是:

Regex linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
foreach(Match m in linkParser.Matches(rawString))
    MessageBox.Show(m.Value); 

And the string split method: 和字符串拆分方法:

string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));
foreach (string s in links)
    MessageBox.Show(s);

Which one is the most performant way to do it? 哪种方法最有效?

Split is faster. 拆分速度更快。 Here is some code that you can test with: dotnetfiddle link 这是一些可以测试的代码: dotnetfiddle链接

using System;
using System.Diagnostics;
using System.Linq;
using System.Text.RegularExpressions;

public class Program
{

    public void Main()
    {
        Stopwatch sw = new Stopwatch();

        sw.Start();

        for (int i=0; i < 500; i++)
        {
            Regex linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
            string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
        }

        sw.Stop();

        var test1Time = sw.ElapsedMilliseconds;


        sw.Reset();
        sw.Start();

        for (int i=0; i < 500; i++)
        {
            string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
            var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));  
        }

        sw.Stop();

        var test2Time = sw.ElapsedMilliseconds;

        Console.WriteLine("Regex Test: " + test1Time.ToString());
        Console.WriteLine("Split Test: " + test2Time.ToString());
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM