Python 正则表达式 - 获取 substring 之前和之后的字符串

Question

import re
txt = '<li>one. URL : <a href="http://local.ru">http://local.ru</a> (10.02.2022).</li><li>Two</li><li>Three. URL : <a href="https://local.ru">https://local.ru</a> (15.11.2021).</li>'
re.findall(r'(<li>.*?)\s?URL\s?:\s?(<a.*?>).*?(</a>.*?</li>)', txt)

I need gen output我需要一代 output

[('<li>one.', '<a href="http://local.ru">', '</a> (10.02.2022).</li>'),
 ('<li>Three.', '<a href="https://local.ru">', '</a> (15.11.2021).</li>')]

If without the first brackets, then it works.如果没有第一个括号，那么它可以工作。 But it does not output the text但它没有 output 的文字

Answer 1

Seems like your regex was too generous on the .*?似乎您的正则表达式在.*? , if you limit to non-node with [^<>] , then you get the expected output. ，如果您使用[^<>]限制为非节点，那么您将获得预期的 output。

import re

txt = (
    '<li>one. URL : <a href="http://local.ru">http://local.ru</a> (10.02.2022).</li>'
    '<li>Two</li>'
    '<li>Three. URL : <a href="https://local.ru">https://local.ru</a> (15.11.2021).</li>'
    )

re.findall(r"(<li>[^<>]*?)\s?URL\s?:\s?(<a[^>]*?>).*?(</a>.*?</li>)", txt)

gives给

[('<li>one.', '<a href="http://local.ru">', '</a> (10.02.2022).</li>'),
 ('<li>Three.', '<a href="https://local.ru">', '</a> (15.11.2021).</li>')]

Python 正则表达式 - 获取 substring 之前和之后的字符串

问题描述

1 个解决方案

解决方案1
0 2022-08-11 16:18:17

Python 正则表达式 - 获取 substring 之前和之后的字符串

问题描述

1 个解决方案

解决方案1 0 2022-08-11 16:18:17

解决方案1
0 2022-08-11 16:18:17