从String Python中提取子串

Question

我正在尝试从字符串中提取以下子字符串

-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $

字符串我想提取： $Revision: 1.14 (or just 1.14)

我的代码如下：

from sys import *
from os.path import *
import re 

script, filename = argv

print "Filename: %s\n" % filename

def check_string():
    found = False
    with open(filename) as f:
        for line in f:
        if re.search("(?<=\$Revision: ) 1.14", line):
            print line
            found = True
        if not found:
            print "No Header exists in %s" % filename

check_string()

这似乎不起作用。

有什么建议？

谢谢！

Answer 1

if re.search("(?<=\$Revision: ) 1.14", line):

您的行无法正常工作，因为您尝试匹配:和1.14之间的两个空格，请尝试：

if re.search("(?<=\$Revision: )1.14", line):

要么

if re.search("\$Revision:\s+1.14", line):

Answer 2

你的正则表达式在冒号和版本号之间需要两个空格，输入只包含一个空格。

Answer 3

如果我正确地理解你并且拆分应该做你想要的：

if "$Revision:" in line:
    print(line.split("$Revision: ")[1].split()[0])
1.14


In [6]: line ="""
   ...: -- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
   ...: ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $
   ...: """

In [7]: line.split("$Revision: ")  # split the line at $Revision: 
Out[7]: 
['\n-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p\nls,v $, ',
 '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n']

# we use indexing to get the first element after $Revision:  in the string
In [8]: line.split("$Revision: ")[1] 
# which becomes the substring below
Out[8]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'

# if we call split again we split that substring on whitespace into individual strings
In [10]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'.split()
Out[10]: ['1.14', '$,', '$Author:', '$,', '$Date:', '2014/09/23', '21:41:15', '$']

# using indexing again we extract the first element which is the  revision number
In [11]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'.split()[0]
Out[11]: '1.14'

$Date是一样的：

 date  = line.split("$Date: ")[1].split()[0]

或者只是in你想检查字符串中的子字符串时使用：

if "$Revision: 1.14" in line:
    print line

Answer 4

>>> import re
>>> string="""-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
... ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $"""
>>> re.findall(r'\$Revision:\s*([0-9.]*)',string,re.DOTALL) # if more than one such value is to be searched
['1.14']   
>>> re.search(r'\$Revision:\s*([0-9.]*)',string,re.DOTALL).group(1) # if only one such value neeeds to be found 
'1.14'

Answer 5

import sys

def check_string(f,target):
    for line in f:
        if line.find(target)>=0:
            return line

script, filename = argv

f = open(filename)
rev_line = check_string(f,'Revision: 1.14')
if rev_line:
    ...
else:
    ...

`check_string`函数

不需要正则表达式
line.find(target)返回-1失败时的指数target在line上成功
如果索引不小于0我们有匹配，所以我们返回line
如果我们找不到匹配项，我们就会掉出函数的边界，返回None

通话程序

在通常的样板文件之后，我们将变量rev_line分配给check_string 。 如果我们没有找到'Revision: 1.14' ，则rev_line为None ，否则为包含目标的整行。 继续做两种情况下要做的事情。

编辑

如果在编写程序时不知道修订号，则有两种情况

修订号源自文件，或以其他方式计算，并且在执行时知道

 target = 'Revision: %d.%d' % (major, minor) rev_line = check_string(f, target)

检查时修订号是不完全已知的 ，在这种情况下，你构建一个包含正则表达式的target字符串并修改check_string的内部，代替if line.find(target)>=0:你写的if re.search(target, line):这与你在第一个地方写的非常相似，但是正则表达式不再硬编码到函数中，你可以在主程序体中自由决定它。

总而言之， 2.更好，因为你总能建立一个“恒定”的正则表达式......

从String Python中提取子串

问题描述

5 个解决方案

解决方案1
2 2014-11-08 23:09:49

解决方案2
1 2014-11-08 23:10:38

解决方案3
1 已采纳 2014-11-08 23:23:17

解决方案4
0 2014-11-08 23:18:00

解决方案5
0 2014-11-09 00:09:29

`check_string`函数

通话程序

编辑

从String Python中提取子串

问题描述

5 个解决方案

解决方案1 2 2014-11-08 23:09:49

解决方案2 1 2014-11-08 23:10:38

解决方案3 1 已采纳 2014-11-08 23:23:17

解决方案4 0 2014-11-08 23:18:00

解决方案5 0 2014-11-09 00:09:29

check_string函数

通话程序

编辑

解决方案1
2 2014-11-08 23:09:49

解决方案2
1 2014-11-08 23:10:38

解决方案3
1 已采纳 2014-11-08 23:23:17

解决方案4
0 2014-11-08 23:18:00

解决方案5
0 2014-11-09 00:09:29

`check_string`函数