Python-正則表達式在多行中匹配多個模式

Question

因此，我現在正着手構建正則表達式，並且總體上取得了一些巨大的成功。 但是我有一個使我感到困惑的特殊情況。 我可以得到我想要的比賽，但是它並不漂亮，而且在任何形式，形狀或形式上都做得不好。

我正則表達式匹配多行一些HTML文檔。 在這些文檔中，我需要一些信息塊，這些信息塊與每個塊中的可變模式匹配，然后再提取我所需的信息。

有需要信息的html的多個塊，看起來像這樣：

<td headers="col0" class="OraTableCellNumber" style=";" nowrap="1"  valign="top" ><a href='/Orion/PatchDetails/process_form?patch_num=6880880&aru=13915384&release=80101000&plat_lang=226P&patch_num_id=979662&' title="View Patch Details">6880880</a></td>
<td headers="col0" class="OraTableCellText" style=";"   valign="top" ><b>Universal Installer</b>: Patch<br>OPatch 9i, 10.1</td>
<td headers="col0" class="OraTableCellText" style=";"   valign="top" >10.1.0.0.0</td>
<td headers="col0" class="OraTableCellText" style=";" nowrap="1"  valign="top" >08-JUL-2011</td>
<td headers="col0" class="OraTableCellText" style=";"   valign="top" >25M</td>
<td headers="col0" class="OraTableCellText" style=";text-align: center;"   valign="middle" width="15"><a href='javascript:showDetails("/Orion/Readme/process_form?aru=13915384&no_header=1")'><img src="/olaf/images/forms/readme.gif" valign=bottom border=0 title="View Readme" alt="View Readme"></a></td>
<td headers="col0" class="OraTableCellText" style=";text-align: center;"   valign="middle" width="15"><a href="https://updates.oracle.com/Orion/Download/process_form/p6880880_101000_Linux-x86-64.zip?aru=13915384&file_id=42098007&patch_file=p6880880_101000_Linux-x86-64.zip&"><img src="/olaf/images/forms/download.gif" valign=bottom border=0 title="Download Now" alt="Download Now"></a></td></tr>
<tr class="OraBGAccentLight" height="28" onMouseOver="javascript:setRowClass(this, 'highlight', 1);" onMouseOut="javascript:setRowClass(this, 'highlight', 0);">

我目前正在使用Python工作，我的正則表達式為：

re.compile(r"/Orion/PatchDetails/process_form.+?release=80102000.*\n.*\n.*\n.*\n.*\n.*\n.*zip[^\"]*", re.MULTILINE)

我想要的輸出是：

20180516140046EDT - DEBUG - ['/Orion/PatchDetails/process_form?patch_num=6880880&aru=13116068&release=80102000&plat_lang=226P&patch_num_id=979663&\' title="View Patch Details">6880880</a></td>\n<td headers="col0" class="OraTableCellText" style=";"   valign="top" ><b>Universal Installer</b>: Patch<br>OPatch 10.2</td>\n<td headers="col0" class="OraTableCellText" style=";"   valign="top" >10.2.0.0.0</td>\n<td headers="col0" class="OraTableCellText" style=";" nowrap="1"  valign="top" >18-NOV-2010</td>\n<td headers="col0" class="OraTableCellText" style=";"   valign="top" >26M</td>\n<td headers="col0" class="OraTableCellText" style=";text-align: center;"   valign="middle" width="15"><a href=\'javascript:showDetails("/Orion/Readme/process_form?aru=13116068&no_header=1")\'><img src="/olaf/images/forms/readme.gif" valign=bottom border=0 title="View Readme" alt="View Readme"></a></td>\n<td headers="col0" class="OraTableCellText" style=";text-align: center;"   valign="middle" width="15"><a href="https://updates.oracle.com/Orion/Download/process_form/p6880880_102000_Linux-x86-64.zip?aru=13116068&file_id=34545782&patch_file=p6880880_102000_Linux-x86-64.zip&']

我要提取一個發行列表，然后將它們用作搜索條件來提取下載URL。 我通常會接受不同的解決方案。 但是我想保留使用正則表達式的范圍，因為這是我使用的標簽，如果這是對正則表達式的嚴重錯過，請告訴我

任何人都可以幫助我，不僅可以優化它，還可以使用建議的正則表達式向我解釋邏輯。

TLDR：我需要將前導模式與變量匹配（本例中為80102000是變量），忽略\\ n直到匹配第二個模式。

模式1：/ /Orion/PatchDetails/process_form.+?release=80102000需要...之間的文本模式2： *zip[^\\"]*

謝謝高級！

Answer 1

流行的觀點是，使用正則表達式解析HTML不是一個好主意，請參閱https://stackoverflow.com/a/1732454/9778302

Answer 2

map(lambda line: re.search(expr,line), iterable_containing_lines)

可能會做你想要的。 您將獲得僅包含在正則表達式上成功的行的map對象（可迭代）。

Answer 3

import re

regex = r"""
  Orion/PatchDetails/process_form.+?release=\d+       
  (.+)   # use this as your match
  zip[^\"]
  """

matches = re.compile(regex, re.MULTILINE | re.DOTALL | re.VERBOSE)

將re.DOTALL添加到let中. 包括\\n 。 對於正則表達式，這使您可以匹配多行

https://regex101.com/r/jBwq20/1

Answer 4

我對此進行了改進，使其適用於各種\\ n，並且在我的代碼中也具有這種穩定的特性：

regex = re.compile('/Orion/PatchDetails/process_form.+?release=' + patch_info['Release'] + '.*?"((https)s?://.*?)"', re.DOTALL)

Python-正則表達式在多行中匹配多個模式

問題描述

4 個解決方案

解決方案1
0 2018-05-16 18:21:02

解決方案2
0 2018-05-16 18:21:02

解決方案3
0 已采納 2018-05-16 20:28:38

解決方案4
0 2018-05-16 21:09:56

Python-正則表達式在多行中匹配多個模式

問題描述

4 個解決方案

解決方案1 0 2018-05-16 18:21:02

解決方案2 0 2018-05-16 18:21:02

解決方案3 0 已采納 2018-05-16 20:28:38

解決方案4 0 2018-05-16 21:09:56

解決方案1
0 2018-05-16 18:21:02

解決方案2
0 2018-05-16 18:21:02

解決方案3
0 已采納 2018-05-16 20:28:38

解決方案4
0 2018-05-16 21:09:56