简体   繁体   English

Python正则表达式:替换,忽略空字符串

[英]Python regex: replace ignoring empty string

I am trying to replace a given pattern with regular expressions in Python, using re . 我试图使用re在Python中用正则表达式替换给定的模式。 Here is the piece of Python code I wrote: 这是我编写的Python代码:

import re

fname = './prec.f90'
f = open(fname)
lines = f.readlines()
f.close()
for i, line in enumerate(lines):
    search = re.findall('([\d*]?\.[\d*]?)+?[^dq\_]', line)
    if search != []: 
        print('Real found in line #%d: ' %i)
        print search
        print('The following line:\n %s' %line)
        print('will be replace by:')
        newline = re.sub('([\d*]?\.[\d*]?)+?[^dq\_]', r'\g<1>d0\g<2>', line)
        print('%s' %newline)

And the prec.f90 contains something like that (it is just an example, it does not means that all the strings I want to replace have the form [x]_[yz] = ...; ): 而且prec.f90包含类似的内容(这只是一个示例,并不意味着我要替换的所有字符串都具有[x]_[yz] = ...; ):

  x_pr = 0.1; y_pr = 0.2; z_pr = 0.1q0
  x_sp = 0.1; y_sp = 0.1d0; z_sp = 0.1q0
  x_dp = 0.1; y_dp = 0.1d0; z_dp = 0.1q0
  x_qp = .1; y_qp = 0.1d0; z_qp = 0.1q0
  x_db = 0.; y_db = 0.1d0; y_db = 0.1q0

My goal is to modify all the pattern like 0.1 , .1 and 0. , to get something like 0.1d0 ; 我的目标是修改所有模式,例如0.1.10. ,以获得类似0.1d0 I don't want to modify the other patterns. 我不想修改其他模式。 The problem is that re.findall('[\\d*]?\\.[\\d*]?)+?([^dq\\_]') matches the pattern I am looking for, but also returns an empty string for the other ones. 问题是re.findall('[\\d*]?\\.[\\d*]?)+?([^dq\\_]')匹配我正在寻找的模式,但也返回了一个空字符串其他的。 Therefore, when I run this piece of code, it fails, being unable to replace match the first and second groups in the re.sub() for the empty strings. 因此,当我运行这段代码时,它失败了,无法为空字符串替换匹配re.sub()中的第一组和第二组。

I guess one solution would be to ignore empty string in the re.sub , or to have something like a conditional argument in it, but I could not figure out how. 我猜一种解决方案是忽略re.sub空字符串,或者在其中添加类似条件参数的内容,但是我不知道如何做。

Any help would be appreciated! 任何帮助,将不胜感激!

You can simplify the sub as 您可以将sub简化为

>>> str="x_db = 0.; y_db = 0.1d0; y_db = 0.1q"
>>> re.sub(r'(0\.1|\.1|0\.)(?=;)', r'\g<1>0d0', str)
'x_db = 0.0d0; y_db = 0.1d0; y_db = 0.1q'

The regex (0\\.1|\\.1|0\\.)(?=;) would match 0.1 , .1 and 0. followed by as ; 正则表达式(0\\.1|\\.1|0\\.)(?=;)将匹配0.1.10. ;

(x_[a-zA-Z]{2}\s*=)\s+[^;]+

Try this.Replace by \\1 0.1d0 .See demo. 尝试一下。替换为\\1 0.1d0 。请参见演示。

http://regex101.com/r/qZ6sE3/2 http://regex101.com/r/qZ6sE3/2

import re
p = re.compile(ur'(x_[a-zA-Z]{2}\s*=)\s+[^;]+')
test_str = u"x_pr = 0.1; y_pr = 0.2; z_pr = 0.1q0\nx_sp = 0.1; y_sp = 0.1d0; z_sp = 0.1q0\nx_dp = 0.1; y_dp = 0.1d0; z_dp = 0.1q0\nx_qp = .1; y_qp = 0.1d0; z_qp = 0.1q0\nx_db = 0.; y_db = 0.1d0; y_db = 0.1q0"
subst = u"\1 0.1d0"

result = re.sub(p, subst, test_str)

I finally came up with this piece of code that is working as intended: 我终于想出了按预期工作的这段代码:

import re

fname = './prec.f90'
f = open(fname)
lines = f.readlines()
f.close()
# If there was no end of the line character (\n) we would need to check if 
# this is the end of the line (something like ([^dq\_0-9]|$)
regex = re.compile(r'(\d*\.\d*)([^dq\_0-9])')
for i, line in enumerate(lines):
    search = regex.findall(line)
    if search != []: 
        print('Real found in line #%d: ' %i)
        print search
        print('The following line:\n %s' %line)
        print('will be replace by:')
        newline = regex.sub(r'\g<1>d0\g<2>', line)
        print('%s' %newline)

I first came up with the more complicated regex ([\\d*]?\\.[\\d*]?)+?[^dq\\_] because else I always match the first part of any string ending with d , q or _ . 我首先想出了更复杂的正则表达式([\\d*]?\\.[\\d*]?)+?[^dq\\_]因为否则我总是匹配以dq_ It seemed to be due to the fact that \\d* was not greedy enough; 这似乎是由于\\d*不够贪心。 to add 0-9 in the "ignore" set solves the problem. 在“忽略”集中添加0-9可解决此问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM