[英]C#, extracting strings using regex or string splitting
在阅读了以下问题的答案后: C#正则表达式模式从给定的字符串中提取url-不是完整的html url,而是裸露的链接,我想知道哪种方法是通过使用正则表达式匹配或从文档中提取url的最快方法。使用字符串拆分方法。
因此,您有一个包含html文档的字符串,并且要提取url。
正则表达式的方式是:
Regex linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
foreach(Match m in linkParser.Matches(rawString))
MessageBox.Show(m.Value);
和字符串拆分方法:
string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));
foreach (string s in links)
MessageBox.Show(s);
哪种方法最有效?
拆分速度更快。 这是一些可以测试的代码: dotnetfiddle链接
using System;
using System.Diagnostics;
using System.Linq;
using System.Text.RegularExpressions;
public class Program
{
public void Main()
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i=0; i < 500; i++)
{
Regex linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
}
sw.Stop();
var test1Time = sw.ElapsedMilliseconds;
sw.Reset();
sw.Start();
for (int i=0; i < 500; i++)
{
string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));
}
sw.Stop();
var test2Time = sw.ElapsedMilliseconds;
Console.WriteLine("Regex Test: " + test1Time.ToString());
Console.WriteLine("Split Test: " + test2Time.ToString());
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.