繁体   English   中英

Python中的慢速正则表达式?

[英]Slow regex in Python?

我正在尝试匹配这些字符串

{@csm.foo.bar}

没有匹配任何这些

{@csm.foo.bar-@csm.ooga.booga}
{@csm.foo.bar-42}

我用的正则表达式是

r"\{@csm.((?:[a-zA-Z0-9_]+\.?)+)\}"

如果字符串包含多个匹配项,则会使速度变慢。 为什么? 如果我取消括号匹配,它将运行非常快,就像这样

r"@csm.((?:[a-zA-Z0-9_]+\.?)+)"

但这不是我想要的。

有任何想法吗?

这是示例输入:

<dockLayout id="popup" y="0" x="0" width="{@csm.screenWidth}" height="{@csm.screenHeight}">
  <dataNumber id="selopacity_Volt" name="selopacity_Volt" value="0" />
  <dataNumber id="selopacity_Amp" name="selopacity_Amp" value="0" />
  <animate  trigger="{@m_ds_ML.VIMPBM_BatteryVoltage.valstr}" triggerOn="*"  targetNode="selopacity_Volt"  targetAttr="value" to="1" dur="0ms" ease="in" />
  <animate  trigger="{@m_ds_ML.VIMPBM_BatteryVoltage.valstr}" triggerOn="65024" targetNode="selopacity_Volt"  targetAttr="value" to="0" dur="0ms" ease="in" />
  <animate  trigger="{@m_ds_ML.VIMPBM_BatteryCurrent.valstr}" triggerOn="*"  targetNode="selopacity_Amp" targetAttr="value" to="1" dur="0ms" ease="in" />
  <animate  trigger="{@m_ds_ML.VIMPBM_BatteryCurrent.valstr}" triggerOn="65024" targetNode="selopacity_Amp"  targetAttr="value" to="0" dur="0ms" ease="in" />
  <dockLayout id="item" width="{@csm.screenWidth}" height="{@csm.screenHeight}" depth="-1" clip="false" xmlns="http://www.tat.se/kastor/kml" >
    <dockLayout id="list_item_title" x="0" width="{@csm.screenWidth}" height="{@csm.Gearselection.text_heght-@csm.pageVisualCP_y}">
      <text id="volt_amp_text" x="0" ellipsize="false" font="{@csm.listUnselFont}" color="{@csm.itemUnselColor}" dockLayout.halign="left" dockLayout.valign="bottom" string="{ItemTitle}" />            
    </dockLayout>    
    <dockLayout id="gear_layout" y="0" x="0" width="{@csm.screenWidth}" height="{@csm.vmImage_y_gearselection-@csm.pageVisualCP_y}">
      <image id="battery_image" x="0" dockLayout.halign="left" dockLayout.valign="bottom" opacity="1" src="{@m_MenuModel.Gauges.VoltAmpereMeter.image}"/>
    </dockLayout>
    <!--DockLayout for Voltage Value-->
    <dockLayout id="volt_value" x="0" width="{@csm.VoltAmpereMeter.volt_value_x-@csm.VoltAmpereMeter.List_x}" height="{@csm.vmImage_y_gearselection-@csm.pageVisualCP_y}">
      <text id="volt_value_text" x="0" opacity="{selopacity_Volt*selopacity_Amp}" ellipsize="false" font="{@csm.listUnselFont}" color="{@csm.itemSelColor}" dockLayout.halign="right" dockLayout.valign="bottom" string="{@m_ds_ML.VIMPBM_BatteryVoltage.valstr}" >     
      </text>
    </dockLayout>   
    <!--DockLayout for Voltage Unit-->
    <dockLayout id="volt_unit" x="{@csm.VoltAmpereMeter.volt_unit_x-@csm.VoltAmpereMeter.List_x}" width="{@csm.screenWidth}" height="{@csm.vmImage_y_gearselection-@csm.pageVisualCP_y}">
      <text id="volt_unit_text" x="0" opacity="{selopacity_Volt*selopacity_Amp}" ellipsize="false" font="{@csm.listUnselFont}" color="{@csm.itemSelColor}" dockLayout.halign="left" dockLayout.valign="bottom" string="V" >         
      </text>
    </dockLayout>
    <!--DockLayout for Ampere Value-->
    <dockLayout id="ampere_value" x="0" width="{@csm.VoltAmpereMeter.ampere_value_x-@csm.VoltAmpereMeter.List_x}" height="{@csm.vmImage_y_gearselection-@csm.pageVisualCP_y}">
      <text id="ampere_value_text" x="0" opacity="{selopacity_Amp*selopacity_Volt}" ellipsize="false" font="{@csm.listUnselFont}" color="{@csm.itemSelColor}" dockLayout.halign="right" dockLayout.valign="bottom" string="{@m_ds_ML.VIMPBM_BatteryCurrent.valstr}" >   
      </text>
    </dockLayout>
    <!--DockLayout for Ampere Unit-->
    <dockLayout id="ampere_unit" x="{@csm.VoltAmpereMeter.ampere_unit_x-@csm.VoltAmpereMeter.List_x}" width="{@csm.screenWidth}" height="{@csm.vmImage_y_gearselection-@csm.pageVisualCP_y}">
      <text id="ampere_unit_text" x="0" opacity="{selopacity_Amp*selopacity_Volt}" ellipsize="false" font="{@csm.listUnselFont}" color="{@csm.itemSelColor}" dockLayout.halign="left" dockLayout.valign="bottom" string="A" >           
      </text>
    </dockLayout>
    <!--DockLayout for containing Data Not Available text-->
    <dockLayout id="no_data_textline" x="{@csm.VoltAmpereMeter.List_x1-@csm.VoltAmpereMeter.List_x}" width="{@csm.screenWidth}" height="{@csm.vmImage_y_gearselection-@csm.pageVisualCP_y}">
      <text id="no_data_text" x="0" opacity="{1-(selopacity_Amp*selopacity_Volt)}" ellipsize="false" font="{@csm.listSelFont}" color="{@csm.itemSelColor}" dockLayout.halign="left" dockLayout.valign="bottom" string="{text1}" >           
      </text>
    </dockLayout>
    <!--<rect id="test_rect1" x="{151-28}" y="0" width="1" height="240" opacity="1" fill="#00ff00" />
              <rect id="test_rect1" x="{237-28}" y="0" width="1" height="240" opacity="1" fill="#00ff00" />
              <rect id="test_rect1" x="{160-28}" y="0" width="1" height="240" opacity="1" fill="#00ff00" />
              <rect id="test_rect1" x="{246-28}" y="0" width="1" height="240" opacity="1" fill="#00ff00" />
              <rect id="test_rect8" x="0" y="{161-40}" width="320" height="1" opacity="1" fill="#00ff00" />
              <rect id="test_rect1" x="{109-28}" y="0" width="1" height="240" opacity="1" fill="#00ff00" />-->
  </dockLayout>  
</dockLayout>

您能否提供第一个匹配为“ dog slow”的字符串的测试用例? 顺便说一句,尽管我不知道这对性能是否{@csm ,但是RE中有一个不精确之处-它匹配{@csm开头后的任何单个字符,而不仅仅是点号; 更好的表达式(可能更快,因为它不会使任何点成为“可选”)可能是:

r'\{@csm((?:\.\w+)+)\}'

我并不是一位正则表达式专家,但这可能是由于比赛结束时的括号所致。 您可能会尝试匹配r"\\{@csm.((?:[a-zA-Z0-9_]+\\.?)+)"而只是手动检查结束括号是否出现在末尾。

您可能需要给出一个慢速运动的更好的例子。 对于包含匹配项和不匹配项的合理长字符串:

x="".join(['{@csm.foo.bar-%d}\n{@csm.foo.%dx.baz}\n' % (a,a)
            for a in xrange(10000)])
mymatch=r"\{@csm.((?:[a-zA-Z0-9_]+\.?)+)\}"

for y in re.finditer(mymatch,x):
    print y.group(0)

可以正常工作,但是如果您有足够长的字符串并且搜索效果很差,则可能会遇到问题。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM