Python RegEx嵌套搜索和替换

Question

我需要进行RegEx搜索并替换在引号块内找到的所有逗号。
即

"thing1,blah","thing2,blah","thing3,blah",thing4

需要成为

"thing1\,blah","thing2\,blah","thing3\,blah",thing4

我的代码：

inFile  = open(inFileName,'r')
inFileRl = inFile.readlines()
inFile.close()

p = re.compile(r'["]([^"]*)["]')
for line in inFileRl:
    pg = p.search(line)
    # found comment block
    if pg:
        q  = re.compile(r'[^\\],')
        # found comma within comment block
        qg = q.search(pg.group(0))
        if qg:
            # Here I want to reconstitute the line and print it with the replaced text
            #print re.sub(r'([^\\])\,',r'\1\,',pg.group(0))

我只需要基于RegEx过滤我想要的列，进一步过滤，
然后执行RegEx替换，然后重新构造该行。

如何在Python中执行此操作？

Answer 1

csv模块非常适合解析此类数据，例如默认方言中的csv.reader忽略带引号的逗号。 csv.writer由于存在逗号而重新插入了引号。 我用StringIO给接口提供了类似字符串的文件。

import csv
import StringIO

s = '''"thing1,blah","thing2,blah","thing3,blah"
"thing4,blah","thing5,blah","thing6,blah"'''
source = StringIO.StringIO(s)
dest = StringIO.StringIO()
rdr = csv.reader(source)
wtr = csv.writer(dest)
for row in rdr:
    wtr.writerow([item.replace('\\,',',').replace(',','\\,') for item in row])
print dest.getvalue()

结果：

"thing1\,blah","thing2\,blah","thing3\,blah"
"thing4\,blah","thing5\,blah","thing6\,blah"

Answer 2

一般编辑

有

"thing1\\,blah","thing2\\,blah","thing3\\,blah",thing4

问题，现在不复存在了。

而且，我还没有评论r'[^\\\\],' 。

因此，我完全重写了我的答案。

"thing1,blah","thing2,blah","thing3,blah",thing4

和

"thing1\,blah","thing2\,blah","thing3\,blah",thing4

显示字符串（我想）

import re


ss = '"thing1,blah","thing2,blah","thing3\,blah",thing4 '

regx = re.compile('"[^"]*"')

def repl(mat, ri = re.compile('(?<!\\\\),') ):
    return ri.sub('\\\\',mat.group())

print ss
print repr(ss)
print
print      regx.sub(repl, ss)
print repr(regx.sub(repl, ss))

结果

"thing1,blah","thing2,blah","thing3\,blah",thing4 
'"thing1,blah","thing2,blah","thing3\\,blah",thing4 '

"thing1\blah","thing2\blah","thing3\,blah",thing4 
'"thing1\\blah","thing2\\blah","thing3\\,blah",thing4 '

Answer 3

您可以尝试此正则表达式。


>>> re.sub('(?<!"),(?!")', r"\\,", 
                     '"thing1,blah","thing2,blah","thing3,blah",thing4')
#Gives "thing1\,blah","thing2\,blah","thing3\,blah",thing4

这背后的逻辑是替代一个,与\\,如果不是立即两者之前和之后一"

Answer 4

我想出了使用多个正则表达式函数的迭代解决方案：
finditer（），findall（），group（），start（）和end（）
有一种方法可以将所有这些转换成一个调用自身的递归函数。
有参加者吗？

outfile  = open(outfileName,'w')

p = re.compile(r'["]([^"]*)["]')
q = re.compile(r'([^\\])(,)')
for line in outfileRl:
    pg = p.finditer(line)
    pglen = len(p.findall(line))

    if pglen > 0:
        mpgstart = 0;
        mpgend   = 0;

        for i,mpg in enumerate(pg):
            if i == 0:
                outfile.write(line[:mpg.start()])

            qg    = q.finditer(mpg.group(0))
            qglen = len(q.findall(mpg.group(0)))

            if i > 0 and i < pglen:
                outfile.write(line[mpgend:mpg.start()])

            if qglen > 0:
                for j,mqg in enumerate(qg):
                    if j == 0:
                        outfile.write( mpg.group(0)[:mqg.start()]    )

                    outfile.write( re.sub(r'([^\\])(,)',r'\1\\\2',mqg.group(0)) )

                    if j == (qglen-1):
                        outfile.write( mpg.group(0)[mqg.end():]      )
            else:
                outfile.write(mpg.group(0))

            if i == (pglen-1):
                outfile.write(line[mpg.end():])

            mpgstart = mpg.start()
            mpgend   = mpg.end()
    else:
        outfile.write(line)

outfile.close()

Answer 5

您是否研究过str.replace（）？

str.replace（old，new [，count]）返回字符串的副本，其中所有出现的子字符串old都被new替换。 如果给出了可选的参数count，则仅替换第一个出现的计数。

这是一些文档

希望这可以帮助

Python RegEx嵌套搜索和替换

问题描述

5 个解决方案

解决方案1
3 已采纳 2011-10-04 17:57:16

解决方案2
1 2011-10-04 17:14:28

解决方案3
0 2011-10-04 17:17:41

解决方案4
0

解决方案5
0 2011-10-04 21:18:54

Python RegEx嵌套搜索和替换

问题描述

5 个解决方案

解决方案1 3 已采纳 2011-10-04 17:57:16

解决方案2 1 2011-10-04 17:14:28

解决方案3 0 2011-10-04 17:17:41

解决方案4 0

解决方案5 0 2011-10-04 21:18:54

解决方案1
3 已采纳 2011-10-04 17:57:16

解决方案2
1 2011-10-04 17:14:28

解决方案3
0 2011-10-04 17:17:41

解决方案4
0

解决方案5
0 2011-10-04 21:18:54