python正则表达式查找匹配的字符串

Question

I am trying to find the matched string in a string using regex in Python. 我试图在Python中使用正则表达式在字符串中找到匹配的字符串。 The string looks like this: 该string如下所示：

band   1 # energy  -53.15719532 # occ.  2.00000000

ion      s      p      d    tot
  1  0.000  0.995  0.000  0.995
  2  0.000  0.000  0.000  0.000
tot  0.000  0.996  0.000  0.996

band   2 # energy  -53.15719532 # occ.  2.00000000

ion      s      p      d    tot
  1  0.000  0.995  0.000  0.995
  2  0.000  0.000  0.000  0.000
tot  0.000  0.996  0.000  0.996

band   3 # energy  -53.15719532 # occ.  2.00000000

My goal is to find the string after tot . 我的目标是在tot之后找到字符串。 So the matched string will be something like: 因此，匹配的字符串将类似于：

['0.000  0.996  0.000  0.996', 
'0.000  0.996  0.000  0.996']

Here is my current code: 这是我当前的代码：

pattern = re.compile(r'tot\s+(.*?)\n', re.DOTALL)
pattern.findall(string)

However, the output gives me: 但是，输出给了我：

['1  0.000  0.995  0.000  0.995',
 '0.000  0.996  0.000  0.996',
 '1  0.000  0.995  0.000  0.995',
 '0.000  0.996  0.000  0.996']

Any idea of what I am doing wrong? 任何我做错事的想法吗？

Answer 1

You don't want the DOTALL flag. 您不需要DOTALL标志。 Remove it and use MULTILINE instead. 删除它并改用MULTILINE 。

pattern = re.compile(r'^\s*tot(.*)', re.MULTILINE)

This matches all lines that start with tot . 这匹配以tot开头的所有行。 The rest of the line will be in group 1. 该行的其余部分将在第1组中。

Citing the documentation , emphasis mine: 引用文档，重点是：

re.DOTALL

Make the '.' 标记为'.' special character match any character at all, including a newline ; 特殊字符完全可以匹配任何字符， 包括换行符 ； without this flag, '.' 没有此标志， '.' will match anything except a newline. 将匹配换行符以外的任何内容。

Note that you can easily do this without regex. 请注意，无需正则表达式，您可以轻松地做到这一点。

with open("input.txt", "r") as data_file:
    for line in data_file:
        items = filter(None, line.split(" "))
        if items[0] == "tot":
            # etc

Answer 2

You are using re.DOTALL, which means that the dot "." 您正在使用re.DOTALL，这意味着点“。” will match anything, even newlines, in essence finding both "tot"-s and everything that follows until the next newline: 会匹配所有内容，甚至是换行符，从本质上来说，它会找到“ tot” -s以及下一个换行符之前的所有内容：

                            tot
  1  0.000  0.995  0.000  0.995

and 和

tot  0.000  0.996  0.000  0.996

Removing re.DOTALL should fix your problem. 删除re.DOTALL应该可以解决您的问题。

Edit: Actually, the DOTALL flag is not really the issue (though unnecessary). 编辑：实际上，DOTALL标志不是真正的问题（尽管不必要）。 The problem in the pattern is that the \\s+ matches the newline. 模式中的问题是\\ s +与换行符匹配。 Replacing that with a single space solves that issue: 用单个空格代替可以解决此问题：

pattern = re.compile(r'tot (.*?)\n')

Answer 3

The alternative solution using re.findall function with specific regex pattern: 使用re.findall函数和特定正则表达式模式的替代解决方案：

# str is your inital string
result = re.findall('tot [0-9 .]+(?=\n|$)', str)
print(result)

The output: 输出：

['tot  0.000  0.996  0.000  0.996', 'tot  0.000  0.996  0.000  0.996']

python正则表达式查找匹配的字符串

问题描述

3 个解决方案

解决方案1
4 已采纳 2016-09-04 18:02:41

解决方案2
1 2016-09-04 18:06:42

解决方案3
1 2016-09-04 18:09:04

python正则表达式查找匹配的字符串

问题描述

3 个解决方案

解决方案1 4 已采纳 2016-09-04 18:02:41

解决方案2 1 2016-09-04 18:06:42

解决方案3 1 2016-09-04 18:09:04

解决方案1
4 已采纳 2016-09-04 18:02:41

解决方案2
1 2016-09-04 18:06:42

解决方案3
1 2016-09-04 18:09:04