繁体   English   中英

C#:如何使用Regex筛选/获取html表中特定列的值。 在两个字符串之间获取值

[英]C#: How to filter/get value of the specific column in html-table with Regex. Get value between two strings

我想过滤包含“ 941d3c8a-8d5d-42aa-943e-a07ccaba1629”的项目,并获取每个过滤项目中的值。 对于第一个过滤的项目,值是“   ”,对于第二个过滤器的值是“ ROBO7”。

我在C#中使用正则表达式,如下所示,但似乎无法获得正确的值。 任何帮助表示赞赏。

    Regex regex = new Regex(
      "(<TD[^>]*?>^941d3c8a-8d5d-42aa-943e-a07ccaba1629$<\\/TD>)",
    RegexOptions.IgnoreCase
    | RegexOptions.CultureInvariant
    | RegexOptions.IgnorePatternWhitespace
    | RegexOptions.Compiled
    );
MatchCollection ms = regex.Matches(InputText);
int index = 1;
foreach (Match match in ms)
{
 if (match.Groups[2].Value.StartsWith(Search))
   return index;
  index++;
}
return 0;

在此处输入图片说明

 <TABLE class="mainlayout fixedlayouttable" cellSpacing=0 cellPadding=2 width=1223 summary=* border=0> <TBODY> <TR openspan_RowSchemaId="efbd02b8-fb2f-4f1c-8091-2d82e9bd0220"> <TD width=82 noWrap openspan_CellSchemaId="53174ff2-e55c-472b-ae59-6d430afb3f10"><A onclick="doScanNo('G1S1990706')" href="#">G1S1990706</A> </TD> <TD width=95 openspan_CellSchemaId="4be80351-39e7-4f8a-be50-9233436a3266">G1:一般支払 </TD> <TD width=100 openspan_CellSchemaId="6e5e1343-ef24-4e55-b2a6-9d7c422400c8">R1:QRなし </TD> <TD width=84 noWrap openspan_CellSchemaId="99bb881d-2870-4742-9ebc-689337842cd5">&nbsp; </TD> <TD width=160 noWrap openspan_CellSchemaId="26265b76-713e-4543-9bd5-334416e8b0df"><SPAN class=nowraptext><BR></SPAN></TD> <TD width=186 noWrap openspan_CellSchemaId="785a8e77-1803-4123-94d4-f878f372b86d"><SPAN class=nowraptext><BR></SPAN></TD> <TD width=78 noWrap openspan_CellSchemaId="83fd103d-ee4b-45e8-adb1-7df59fff170c">RobotAgent1 </TD> <TD class=same width=79 noWrap openspan_CellSchemaId="7016e294-b4f4-433f-8711-d798e0b90cf1">2017/09/20 </TD> <TD width=78 noWrap openspan_CellSchemaId="6c26eb0d-0195-45b8-a07b-5b0ac26c636d">ROBO7 </TD> <TD width=78 noWrap openspan_CellSchemaId="a7ebc00f-32ad-4fcf-b991-81983587a91d">&nbsp; </TD> <TD class=same width=79 noWrap openspan_CellSchemaId="6e9ce592-309b-4288-9e96-09c3f3a4374d">2017/11/07 </TD> <TD width=78 noWrap openspan_CellSchemaId="941d3c8a-8d5d-42aa-943e-a07ccaba1629">&nbsp; </TD></TR> <TR openspan_RowSchemaId="efbd02b8-fb2f-4f1c-8091-2d82e9bd0220"> <TD width=82 noWrap openspan_CellSchemaId="53174ff2-e55c-472b-ae59-6d430afb3f10"><A onclick="doScanNo('G1S1990716')" href="#">G1S1990716</A> </TD> <TD width=95 openspan_CellSchemaId="4be80351-39e7-4f8a-be50-9233436a3266">G1:一般支払 </TD> <TD width=100 openspan_CellSchemaId="6e5e1343-ef24-4e55-b2a6-9d7c422400c8">01:スキャン済 </TD> <TD width=84 noWrap openspan_CellSchemaId="99bb881d-2870-4742-9ebc-689337842cd5">&nbsp; </TD> <TD width=160 noWrap openspan_CellSchemaId="26265b76-713e-4543-9bd5-334416e8b0df"><SPAN class=nowraptext><BR></SPAN></TD> <TD width=186 noWrap openspan_CellSchemaId="785a8e77-1803-4123-94d4-f878f372b86d"><SPAN class=nowraptext><BR></SPAN></TD> <TD width=78 noWrap openspan_CellSchemaId="83fd103d-ee4b-45e8-adb1-7df59fff170c">RobotAgent2</TD> <TD class=same width=79 noWrap openspan_CellSchemaId="7016e294-b4f4-433f-8711-d798e0b90cf1">2017/09/20 </TD> <TD width=78 noWrap openspan_CellSchemaId="6c26eb0d-0195-45b8-a07b-5b0ac26c636d">ROBO7 </TD> <TD width=78 noWrap openspan_CellSchemaId="a7ebc00f-32ad-4fcf-b991-81983587a91d">&nbsp; </TD> <TD class=same width=79 noWrap openspan_CellSchemaId="6e9ce592-309b-4288-9e96-09c3f3a4374d">2017/11/13 </TD> <TD width=78 noWrap openspan_CellSchemaId="941d3c8a-8d5d-42aa-943e-a07ccaba1629">ROBO7 </TD></TR> </TBODY> </TABLE> 

您需要的正则表达式是这个

(?<=941d3c8a-8d5d-42aa-943e-a07ccaba1629">)(.*)(?=<\/TD>)

我将稍后向您发布C#解决方案;)(更新)

如果您需要测试Regex,请在此处进行操作以查看条件的有效性。 它根据您的条件匹配两个字符串之间的所有内容。

这是简单的代码示例(我只是在打印,但是您可以用它做任何事情)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Text.RegularExpressions;

namespace test3
{
    class Program
    {

        static void Main(string[] args)
        {

            string createText = System.IO.File.ReadAllText(@"C:\PATH_TO_THE FILE_I_USED\data.txt");

            Regex regex = new Regex("(?<=941d3c8a-8d5d-42aa-943e-a07ccaba1629\" >)(.*)(?=<\\/ TD >)",
                RegexOptions.IgnoreCase
                | RegexOptions.CultureInvariant
                | RegexOptions.IgnorePatternWhitespace
                | RegexOptions.Compiled
                );

            MatchCollection ms = regex.Matches(createText);

            foreach (Match match in ms)
            {
                Console.WriteLine(match);
            }

            Console.ReadLine();
        }

    }
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM