簡體   English   中英

在C#中優化的正則表達式,用於在文本中搜索多行表達式

[英]optimized Regular Expression in c# for searching multiline expression in Text

我正在嘗試在文本文件中查找以下類型的表達式:

<&lt>[some text][newline][some text]<&gt;>

這里要注意的是,在找到結束標記<&gt;>之前,換行符可能很多

我嘗試遵循正則表達式

&lt;(.*?\n.*?)&gt;

它非常適合查找由單行分隔的表達式,但我還需要查找由各行分隔的表達式。

我也嘗試了以下表達式:

&lt;(.*?\\n.*?)*&gt;

但是搜索會導致超時,請幫忙嗎?

用於搜索的示例文本:

<p class=3DMsoNormal style=3D'margin-top:12.0pt;margin-right:0cm;margin-bot=
tom:
0cm;margin-left:148.85pt;margin-bottom:.0001pt;text-indent:-148.85pt;
tab-stops:148.85pt right 16.0cm'><b style=3D'mso-bidi-font-weight:normal'><=
span
style=3D'font-family:"Calibri","sans-serif"'>RISK DETAILS<span style=3D'mso=
-tab-count:
1'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></b><span
style=3D'font-family:"Calibri","sans-serif"'>Your home is described as
&lt;q_1&gt;<o:p></o:p></span></p>

<p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
:0cm;
margin-left:148.85pt;margin-bottom:.0001pt'><span style=3D'font-family:"Cal=
ibri","sans-serif"'>The
construction of your home is &lt;q_2&gt;<o:p></o:p></span></p>

<p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
:0cm;
margin-left:148.85pt;margin-bottom:.0001pt'><span style=3D'font-family:"Cal=
ibri","sans-serif"'>The
main roof material is &lt;q_3&gt;<o:p></o:p></span></p>

<p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
:0cm;
margin-left:148.85pt;margin-bottom:.0001pt'><span style=3D'font-family:"Cal=
ibri","sans-serif"'>Your
home was built in &lt;q_4&gt;<o:p></o:p></span></p>

<p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
:0cm;
margin-left:148.85pt;margin-bottom:.0001pt'><span style=3D'font-family:"Cal=
ibri","sans-serif"'>Your
<span class=3DGramE>home &lt;q_5&gt; double</span> keyed deadlocks to all
external doors<o:p></o:p></span></p>

<p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
:0cm;
margin-left:148.85pt;margin-bottom:.0001pt'><span style=3D'font-family:"Cal=
ibri","sans-serif"'>Your
home &lt;q_6&gt; keyed locks or grilles on all windows<o:p></o:p></span></p>

<p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
:0cm;
margin-left:148.85pt;margin-bottom:.0001pt'><span style=3D'font-family:"Cal=
ibri","sans-serif"'>Your
home has &lt;q_7&gt; alarm installed<o:p></o:p></span></p>

<p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
:0cm;
margin-left:148.85pt;margin-bottom:.0001pt'><span style=3D'font-family:"Cal=
ibri","sans-serif"'>Your
home &lt;q_8&gt; connected to mains water supply<o:p></o:p></span></p>

一些示例:示例1:要搜索的文本:

 <span
      style=3D'color:blue'><o:p></o:p></span></span></p>
      </td>
      <td width=3D103 valign=3Dtop style=3D'width:77.5pt;padding:0cm 5.4pt 0cm =
    0cm'>
      <p class=3DMsoNormal align=3Dright style=3D'margin-top:3.0pt;margin-right=
    :0cm;
      margin-bottom:0cm;margin-left:0cm;margin-bottom:.0001pt;text-align:right;
      tab-stops:155.95pt'><span style=3D'font-family:"Calibri","sans-serif"'>&lt;=
    <span
      class=3DSpellE>spec_contents_value</span>&gt;<span style=3D'color:blue'><=
    o:p></o:p></span></span></p>
      </td>
     </tr>
    </table>

    <p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
    :0cm;
    margin-left:148.85pt;margin-bottom:.0001pt;text-indent:-148.85pt;tab-stops:
    148.85pt right 453.55pt'><span style=3D'font-family:"Calibri","sans-serif"'=
    ><o:p>&nbsp;</o:p></span></p>

    <p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
    :0cm;
    margin-left:148.85pt;margin-bottom:.0001pt;text-indent:-148.85pt;tab-stops:
    148.85pt right 453.55pt'><span style=3D'font-family:"Calibri","sans-serif"'=
    >Unspecified
    Valuables<b style=3D'mso-bidi-font-weight:normal'><span style=3D'mso-tab-co=
    unt:
    1'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </=
    span></b>&lt;valuables&gt;<o:p></o:p></span></p>

    <p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
    :0cm;
    margin-left:148.85pt;margin-bottom:.0001pt;text-indent:-148.85pt;tab-stops:
    148.85pt right 453.55pt'><span style=3D'font-family:"Calibri","sans-serif"'=
    >Specified
    Valuables<b style=3D'mso-bidi-font-weight:normal'><span style=3D'mso-tab-co=
    unt:
    1'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
    sp;&nbsp;&nbsp;&nbsp;&nbsp; </span></b>&lt;<spanclass=3DSpellE>spec_valuables_ni</span>&gt;=
    <o:p></o:p></span></p>

我希望我的Regex.Match模式能夠搜索:

&lt;=
<span
  class=3DSpellE>spec_contents_value</span>&gt;

或任何<...>模式跨越多行。 但不在同一行上的人。

使用DOTALL修飾符使點匹配偶數換行符( \\n\\r )。

(?s)&lt;(?:(?!&[gl]t;).)*?\n(?:(?!&[gl]t;).)*?&gt;

演示

正則表達式如何

 &lt;[^&]*&gt;

例如http://regex101.com/r/iV9lS4/3

  • &lt; 符合&lt;

  • [^&]*匹配&以外的任何字符&包括換行符

  • &gt; 符合&gt;

您也可以使用. 通過提供DOTALL (?s)運算符來匹配任何內容。

對於輸入

&lt;=
<span
  class=3DSpellE>spec_contents_value</span>&gt;

它將匹配為http://regex101.com/r/iV9lS4/4

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM