![](/img/trans.png)
[英]Remove Everything Between Two Characters As Long As They Aren't Inside Some Other Characters
[英]Regex to remove everything between two characters
我有以下字符串:
"<a href=\"/formentries/formfile/13978\" target=\"_blank\">dog-00.jpg|image/jpeg</a> <a href='/FormEntries/Delete' class='btnDeleteAttachment' data-form-entry-id='366793' data-attachment-id='13978'> [remove]</a><br /><a href=\"/formentries/formfile/13979\" target=\"_blank\">dog-01.docx|application/vnd.openxmlformats-officedocument.wordprocessingml.document</a> <a href='/FormEntries/Delete' class='btnDeleteAttachment' data-form-entry-id='366793' data-attachment-id='13979'> [remove]</a><br /><a href=\"/formentries/formfile/13980\" target=\"_blank\">dog-02.png|image/png</a> <a href='/FormEntries/Delete' class='btnDeleteAttachment' data-form-entry-id='366793' data-attachment-id='13980'> [remove]</a>"
如果要很好地格式化,則會看到類似以下內容的內容:
<a href=\"/formentries/formfile/13978\" target=\"_blank\">dog-00.jpg|image/jpeg</a>
<a href='/FormEntries/Delete' class='btnDeleteAttachment' data-form-entry-id='366793' data-attachment-id='13978'> [remove]</a>
<br />
<a href=\"/formentries/formfile/13979\" target=\"_blank\">dog-01.docx|application/vnd.openxmlformats-officedocument.wordprocessingml.document</a>
<a href='/FormEntries/Delete' class='btnDeleteAttachment' data-form-entry-id='366793' data-attachment-id='13979'> [remove]</a>
<br />
<a href=\"/formentries/formfile/13980\" target=\"_blank\">dog-02.png|image/png</a>
<a href='/FormEntries/Delete' class='btnDeleteAttachment' data-form-entry-id='366793' data-attachment-id='13980'> [remove]</a>
所以我有一堆錨標簽,它們之間有間隔。 在每個錨點的文本中,我要刪除管道字符和文件類型:
狗00.jpg |圖像/ JPEG
變
狗00.jpg
正則表達式也應該適用於所有將來的文件類型,例如:
狗01.docx |應用程序/ vnd.openxmlformats-officedocument.wordprocessingml.document
變
狗01.docx
我仍然需要完整的錨點,因此在刪除文件類型后,文本變為:
<a href=\"/formentries/formfile/13978\" target=\"_blank\">dog-00.jpg</a>
<a href='/FormEntries/Delete' class='btnDeleteAttachment' data-form-entry-id='366793' data-attachment-id='13978'> [remove]</a>
<br />
<a href=\"/formentries/formfile/13979\" target=\"_blank\">dog-01.docx</a>
<a href='/FormEntries/Delete' class='btnDeleteAttachment' data-form-entry-id='366793' data-attachment-id='13979'> [remove]</a>
<br />
我對Regex不太擅長,但是我嘗試了各種組合,但都失敗了
不要使用正則表達式來解析復雜的HTML,可以使用HtmlAgilityPack
。 我還將使用諸如Contains
, IndexOf
和Remove
類的字符串方法來代替regex:
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html); // pass in your HTML string
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
string text = link.InnerText;
if (text.Contains('|'))
link.InnerHtml = text.Remove(text.IndexOf('|')); // you can't modify InnerText directly but this works
}
string result = doc.DocumentNode.OuterHtml; // your desired result
輸入:
dog-00.jpg|image/jpeg
僅與|
前面的部分匹配的正則表達式 管:
([^|]+)
描述:
上面的正則表達式會匹配所有內容,直到出現第一個管道字符為止。
C#代碼:
var input = @"dog-00.jpg|image/jpeg";
var regex = new Regex(@"([^|]+)");
var m = regex.Match(input);
string name = null;
if (m.Success)
{
name = m.Groups[1].Value;
}
編輯:
如果這僅是通過管道字符對字符串進行input.Split
,則Dylan Nicholson的帶有input.Split
(或.Substring
+ .IndexOf
)的變體可能比正則表達式更有效。
EDIT2:
是否需要正則表達式? 如果沒有,請嘗試以下操作:
public static string Clean(string input)
{
var sb = new StringBuilder(input);
int m1 = -1, m2 = -1;
for(var i = 0; i < sb.Length; i++)
{
if (sb[i] == '|')
m1 = i;
if (sb[i] == '<')
m2 = i;
if (m1 > -1 && m2 > -1 && m2 > m1)
{
sb.Remove(m1, m2 - m1);
i = m1;
m1 = -1;
m2 = -1;
}
}
return sb.ToString();
}
更新
您可以使用此正則表達式:
(?<=<a[^>]*>[^|]+?)\|.*?(?=</a>)
對於C#:
your_string = Regex.Replace(your_string, "(?<=<a[^>]*>[^|]+?)\\|.*?(?=</a>)", "",
RegexOptions.IgnoreCase | RegexOptions.Multiline);
只需使用此正則表達式替換字符串。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.