正则表达式-跨多行匹配任何字符

Question

I had an HTML string that looks like: 我有一个HTML字符串，看起来像：

<img src="blah blah blah"><p> blah blah
blah blah blah blah blah blah
blah blah blah</p>

How can i read the blah blah... using regex? 我如何使用正则表达式读取blah blah... I tried (.+?) but its not working, and searched google but didnt found a solution for Python . 我尝试了（。+？），但无法正常工作，并搜索了google，但没有找到Python的解决方案。

Thanks! 谢谢！

Answer 1

With the usual disclaimers about using regex to parse html, this will work: 对于使用正则表达式解析html的通常免责声明，这将起作用：

import re
match = re.search("<img[^>]*><p>([^<]*)</p>", subject)
if match:
    blahblah = match.group(1)
    print blahblah

Explanation 说明

<img matches literal chars <img匹配文字字符
[^>]* matches any chars that are not > [^>]*匹配任何非>字符
><p> matches literal chars ><p>匹配文字字符
([^<]*) captures any chars that are not < to Group 1 (this is what we want) ([^<]*)捕获没有任何字符< 1组（这是我们所希望的）
</p> matches literal chars </p>匹配文字字符
match.group(1) contains our string match.group(1)包含我们的字符串

Answer 2

Give you one example for Java: 给你一个Java的例子：

public static void testRegExp() {
    try {
        String input = "<img src=\"blah blah blah\"><p> blah blah" +
    "\n blah blah blah blah blah blah" +
    "\nblah blah blah</p>";
        Pattern pMod = Pattern.compile("(blah\\s+)+");
        Matcher mMod = pMod.matcher(input);
        int beg = 0;
        while (mMod.find()) {
            System.out.println("--------------");
            System.out.println(mMod.group(0));
        }

    } catch(Exception ex) {
        ex.printStackTrace();
    }
}

The output is : 输出为：

blah blah 等等等等

blah blah blah blah blah blah blah blah blah blah 等等等等等等等等等等

For Python, I guess the regeular expression is similar. 对于Python，我猜想regeular表达式是相似的。 Good luck & have a try. 祝你好运并尝试一下。

Answer 3

You could try the below code also which uses (?s) DOTALL modifier, 您也可以尝试使用(?s) DOTALL修饰符的以下代码，

>>> s = """<img src="blah blah blah"><p> blah blah
... blah blah blah blah blah blah
... blah blah blah</p>"""
>>> import re
>>> m = re.search(r'(?s)(?<=<p>).*?(?=<\/p>)', s).group(0)
>>> print m
 blah blah
blah blah blah blah blah blah
blah blah blah

正则表达式-跨多行匹配任何字符

问题描述

3 个解决方案

解决方案1
2 已采纳 2014-07-31 02:58:25

解决方案2
0 2014-07-31 02:39:49

The output is : 输出为：

blah blah 等等等等

解决方案3
0 2014-07-31 03:55:17

正则表达式-跨多行匹配任何字符

问题描述

3 个解决方案

解决方案1 2 已采纳 2014-07-31 02:58:25

解决方案2 0 2014-07-31 02:39:49

The output is : 输出为：

blah blah 等等等等

解决方案3 0 2014-07-31 03:55:17

解决方案1
2 已采纳 2014-07-31 02:58:25

解决方案2
0 2014-07-31 02:39:49

解决方案3
0 2014-07-31 03:55:17