[英]Extract Substring from String Python
我正在尝试从字符串中提取以下子字符串
-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $
字符串我想提取: $Revision: 1.14 (or just 1.14)
我的代码如下:
from sys import *
from os.path import *
import re
script, filename = argv
print "Filename: %s\n" % filename
def check_string():
found = False
with open(filename) as f:
for line in f:
if re.search("(?<=\$Revision: ) 1.14", line):
print line
found = True
if not found:
print "No Header exists in %s" % filename
check_string()
这似乎不起作用。
有什么建议?
谢谢!
if re.search("(?<=\$Revision: ) 1.14", line):
您的行无法正常工作,因为您尝试匹配:
和1.14
之间的两个空格,请尝试:
if re.search("(?<=\$Revision: )1.14", line):
要么
if re.search("\$Revision:\s+1.14", line):
你的正则表达式在冒号和版本号之间需要两个空格,输入只包含一个空格。
如果我正确地理解你并且拆分应该做你想要的:
if "$Revision:" in line:
print(line.split("$Revision: ")[1].split()[0])
1.14
In [6]: line ="""
...: -- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
...: ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $
...: """
In [7]: line.split("$Revision: ") # split the line at $Revision:
Out[7]:
['\n-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p\nls,v $, ',
'1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n']
# we use indexing to get the first element after $Revision: in the string
In [8]: line.split("$Revision: ")[1]
# which becomes the substring below
Out[8]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'
# if we call split again we split that substring on whitespace into individual strings
In [10]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'.split()
Out[10]: ['1.14', '$,', '$Author:', '$,', '$Date:', '2014/09/23', '21:41:15', '$']
# using indexing again we extract the first element which is the revision number
In [11]: '1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $\n'.split()[0]
Out[11]: '1.14'
$Date
是一样的:
date = line.split("$Date: ")[1].split()[0]
或者只是in
你想检查字符串中的子字符串时使用:
if "$Revision: 1.14" in line:
print line
>>> import re
>>> string="""-- CVS Header: $Source: /CVS/oracle11i/database/erp/apps/pkgspec/wwt_prime_pkg.p
... ls,v $, $Revision: 1.14 $, $Author: $, $Date: 2014/09/23 21:41:15 $"""
>>> re.findall(r'\$Revision:\s*([0-9.]*)',string,re.DOTALL) # if more than one such value is to be searched
['1.14']
>>> re.search(r'\$Revision:\s*([0-9.]*)',string,re.DOTALL).group(1) # if only one such value neeeds to be found
'1.14'
import sys
def check_string(f,target):
for line in f:
if line.find(target)>=0:
return line
script, filename = argv
f = open(filename)
rev_line = check_string(f,'Revision: 1.14')
if rev_line:
...
else:
...
check_string
函数 line.find(target)
返回-1
失败时的指数target
在line
上成功 0
我们有匹配,所以我们返回line
None
在通常的样板文件之后,我们将变量rev_line
分配给check_string
。 如果我们没有找到'Revision: 1.14'
,则rev_line
为None
,否则为包含目标的整行。 继续做两种情况下要做的事情。
如果在编写程序时不知道修订号,则有两种情况
修订号源自文件,或以其他方式计算,并且在执行时知道
target = 'Revision: %d.%d' % (major, minor) rev_line = check_string(f, target)
检查时修订号是不完全已知的 ,在这种情况下,你构建一个包含正则表达式的target
字符串并修改check_string
的内部,代替if line.find(target)>=0:
你写的if re.search(target, line):
这与你在第一个地方写的非常相似,但是正则表达式不再硬编码到函数中,你可以在主程序体中自由决定它。
总而言之, 2.
更好,因为你总能建立一个“恒定”的正则表达式......
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.