簡單的正則表達式問題（包括正則表達式）

Question

我在字符串流中有這些字符串：

"do=whoposted&amp;t=1934067" rel=nofollow>61</A></TD><TD class=alt2 align=middle>5,286</TD></TR><TR><TD id=td_threadstatusicon_1911046 class=alt1><IMG id=thread_statusicon_1911046 border=0 alt="" src="http://url.com/forum/images/statusicon/thread_new.gif"> </TD><TD class=alt2><IMG title=Film border=0 alt=Film src="http://url.com/forum/images/icons/new.png"></TD><TD id=td_threadtitle_1911046 class=alt1 title="http://lulzimg.com/i14/7bd11b.jpg &#10; &#10;Complete name : cool-thread.."><DIV><A id=thread_gotonew_1911046 href="http://url.com/forum/f80/cool-topic-new/"><IMG class=inlineimg title="Go to first new post" border=0 alt="Go to first new post" src="http://url.com/forum/images/buttons/firstnew.gif"></A> [MULTI] <A style="FONT-WEIGHT: bold" id=thread_title_1911046 href="http://url.com/forum/f80/cool-topic-name-1911046/">Cool Topic Name</A> </DIV><DIV class=smallfont><SPAN style="CURSOR: pointer" onclick="window.open('http://url.com/forum/members/u2031889/', '_self')">m3no</SPAN> </DIV></TD><TD class=alt2 title="Replies: 11, Views: 1,554"><DIV style="TEXT-ALIGN: right; WHITE-SPACE: nowrap" class=smallfont>Today <SPAN class=time>08:04 AM</SPAN><BR>by <A href="http://url.com/forum/members/u1131830/" rel=nofollow>karetsos</A> <A "

目前我用這個：

Regex pattern = new Regex ( "<A\\s+href=\"([^\"]*)\">([^\\x00]*?)\\s+id=thread_title_(\\S+)</A>" );

MatchCollection matches = pattern.Matches ( doc.ToString ( ) );

foreach ( Match match in matches )
{
    int id = Convert.ToInt32 ( match.Groups [ 1 ].Value );

    string name = match.Groups [ 3 ].Value;
    string link = match.Groups [ 2 ].Value;

    ...
}

但它與任何東西都不匹配。

我想要提取的是：

編號： 942321 ， 512147 。

名稱： "Visible Thread Name" ， "Cool Thread"

鏈接： "http://url.com/forum/f80/new-topic-name-942321" ， "http://url.com/forum/f80/cool-topic-name-512147"

關於如何修復它的任何想法？

Answer 1

這將返回您需要的內容。 這里不需要過於嚴格：

<a.+href=".*topic\-name\-(\S+)\/.+thread_title_(\S+)"

Answer 2

我發現的問題清單：

默認情況下，正則表達式區分大小寫（a！= A）。 一種可能的解決方案是將RegexOptions.IgnoreCase作為第二個參數傳遞給Regex構造函數。
id=thread...你似乎缺少開放"后id
在匹配id之后你突然停止了...你不想在第三組中匹配這個名字嗎？ 我想你的正則表達式應該像這樣結束：
```
 id=\\"thread_title_([0-9]+)\\">([^<]+)</a> 
```
哦，並且不要在href之后關閉a標簽，因為thread_title_id仍在標簽內：
href=\\"([^\\"]*)\\"> ：刪除最后的>
另外，刪除那個奇怪的[^\\\\x00]*? 組。 什么是好事呢？
在捕獲thread_title_id之后，您需要忽略stuff，直到結束> ，以便忽略style=...屬性。

完整的解決方案（警告，擾亂前方）。 @"..."語法確保您不需要轉義反斜杠（但您需要通過雙引號轉義引號）。

Regex pattern = new Regex (@"<a\s+href=""([^""]*)""\s+id=""thread_title_([0-9]+)""[^>]*>([^<]+)</a>");

順便說一句，為了調試這個，我使用了以下工具，我可以推薦它並自動提供轉義版本：

http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

簡單的正則表達式問題（包括正則表達式）

問題描述

2 個解決方案

解決方案1
1 2011-03-17 15:27:09

解決方案2
1 已采納 2011-03-17 15:55:20

簡單的正則表達式問題（包括正則表達式）

問題描述

2 個解決方案

解決方案1 1 2011-03-17 15:27:09

解決方案2 1 已采納 2011-03-17 15:55:20

解決方案1
1 2011-03-17 15:27:09

解決方案2
1 已采納 2011-03-17 15:55:20