[英]C#: How to filter/get value of the specific column in html-table with Regex. Get value between two strings
我想过滤包含“ 941d3c8a-8d5d-42aa-943e-a07ccaba1629”的项目,并获取每个过滤项目中的值。 对于第一个过滤的项目,值是“
”,对于第二个过滤器的值是“ ROBO7”。
我在C#中使用正则表达式,如下所示,但似乎无法获得正确的值。 任何帮助表示赞赏。
Regex regex = new Regex(
"(<TD[^>]*?>^941d3c8a-8d5d-42aa-943e-a07ccaba1629$<\\/TD>)",
RegexOptions.IgnoreCase
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
MatchCollection ms = regex.Matches(InputText);
int index = 1;
foreach (Match match in ms)
{
if (match.Groups[2].Value.StartsWith(Search))
return index;
index++;
}
return 0;
<TABLE class="mainlayout fixedlayouttable" cellSpacing=0 cellPadding=2 width=1223 summary=* border=0> <TBODY> <TR openspan_RowSchemaId="efbd02b8-fb2f-4f1c-8091-2d82e9bd0220"> <TD width=82 noWrap openspan_CellSchemaId="53174ff2-e55c-472b-ae59-6d430afb3f10"><A onclick="doScanNo('G1S1990706')" href="#">G1S1990706</A> </TD> <TD width=95 openspan_CellSchemaId="4be80351-39e7-4f8a-be50-9233436a3266">G1:一般支払 </TD> <TD width=100 openspan_CellSchemaId="6e5e1343-ef24-4e55-b2a6-9d7c422400c8">R1:QRなし </TD> <TD width=84 noWrap openspan_CellSchemaId="99bb881d-2870-4742-9ebc-689337842cd5"> </TD> <TD width=160 noWrap openspan_CellSchemaId="26265b76-713e-4543-9bd5-334416e8b0df"><SPAN class=nowraptext><BR></SPAN></TD> <TD width=186 noWrap openspan_CellSchemaId="785a8e77-1803-4123-94d4-f878f372b86d"><SPAN class=nowraptext><BR></SPAN></TD> <TD width=78 noWrap openspan_CellSchemaId="83fd103d-ee4b-45e8-adb1-7df59fff170c">RobotAgent1 </TD> <TD class=same width=79 noWrap openspan_CellSchemaId="7016e294-b4f4-433f-8711-d798e0b90cf1">2017/09/20 </TD> <TD width=78 noWrap openspan_CellSchemaId="6c26eb0d-0195-45b8-a07b-5b0ac26c636d">ROBO7 </TD> <TD width=78 noWrap openspan_CellSchemaId="a7ebc00f-32ad-4fcf-b991-81983587a91d"> </TD> <TD class=same width=79 noWrap openspan_CellSchemaId="6e9ce592-309b-4288-9e96-09c3f3a4374d">2017/11/07 </TD> <TD width=78 noWrap openspan_CellSchemaId="941d3c8a-8d5d-42aa-943e-a07ccaba1629"> </TD></TR> <TR openspan_RowSchemaId="efbd02b8-fb2f-4f1c-8091-2d82e9bd0220"> <TD width=82 noWrap openspan_CellSchemaId="53174ff2-e55c-472b-ae59-6d430afb3f10"><A onclick="doScanNo('G1S1990716')" href="#">G1S1990716</A> </TD> <TD width=95 openspan_CellSchemaId="4be80351-39e7-4f8a-be50-9233436a3266">G1:一般支払 </TD> <TD width=100 openspan_CellSchemaId="6e5e1343-ef24-4e55-b2a6-9d7c422400c8">01:スキャン済 </TD> <TD width=84 noWrap openspan_CellSchemaId="99bb881d-2870-4742-9ebc-689337842cd5"> </TD> <TD width=160 noWrap openspan_CellSchemaId="26265b76-713e-4543-9bd5-334416e8b0df"><SPAN class=nowraptext><BR></SPAN></TD> <TD width=186 noWrap openspan_CellSchemaId="785a8e77-1803-4123-94d4-f878f372b86d"><SPAN class=nowraptext><BR></SPAN></TD> <TD width=78 noWrap openspan_CellSchemaId="83fd103d-ee4b-45e8-adb1-7df59fff170c">RobotAgent2</TD> <TD class=same width=79 noWrap openspan_CellSchemaId="7016e294-b4f4-433f-8711-d798e0b90cf1">2017/09/20 </TD> <TD width=78 noWrap openspan_CellSchemaId="6c26eb0d-0195-45b8-a07b-5b0ac26c636d">ROBO7 </TD> <TD width=78 noWrap openspan_CellSchemaId="a7ebc00f-32ad-4fcf-b991-81983587a91d"> </TD> <TD class=same width=79 noWrap openspan_CellSchemaId="6e9ce592-309b-4288-9e96-09c3f3a4374d">2017/11/13 </TD> <TD width=78 noWrap openspan_CellSchemaId="941d3c8a-8d5d-42aa-943e-a07ccaba1629">ROBO7 </TD></TR> </TBODY> </TABLE>
您需要的正则表达式是这个
(?<=941d3c8a-8d5d-42aa-943e-a07ccaba1629">)(.*)(?=<\/TD>)
我将稍后向您发布C#解决方案;)(更新)
如果您需要测试Regex,请在此处进行操作以查看条件的有效性。 它根据您的条件匹配两个字符串之间的所有内容。
这是简单的代码示例(我只是在打印,但是您可以用它做任何事情)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Text.RegularExpressions;
namespace test3
{
class Program
{
static void Main(string[] args)
{
string createText = System.IO.File.ReadAllText(@"C:\PATH_TO_THE FILE_I_USED\data.txt");
Regex regex = new Regex("(?<=941d3c8a-8d5d-42aa-943e-a07ccaba1629\" >)(.*)(?=<\\/ TD >)",
RegexOptions.IgnoreCase
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
MatchCollection ms = regex.Matches(createText);
foreach (Match match in ms)
{
Console.WriteLine(match);
}
Console.ReadLine();
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.