简体   繁体   English

Python查找和替换脚本中的正则表达式?更新

[英]Regular expressions in a Python find-and-replace script? Update

I'm new to Python scripting, so please forgive me in advance if the answer to this question seems inherently obvious. 我是Python脚本的新手,所以如果这个问题的答案看起来很明显,请提前原谅我。

I'm trying to put together a large-scale find-and-replace script using Python. 我正在尝试使用Python组建一个大规模的查找和替换脚本。 I'm using code similar to the following: 我使用的代码类似于以下内容:

infile = sys.argv[1]
charenc = sys.argv[2]
outFile=infile+'.output'

findreplace = [
('term1', 'term2'),
]

inF = open(infile,'rb')
s=unicode(inF.read(),charenc)
inF.close()

for couple in findreplace:
    outtext=s.replace(couple[0],couple[1])
    s=outtext

outF = open(outFile,'wb')
outF.write(outtext.encode('utf-8'))
outF.close()

How would I go about having the script do a find and replace for regular expressions? 如何让脚本执行查找并替换正则表达式?

Specifically, I want it to find some information (metadata) specified at the top of a text file. 具体来说,我希望它能够找到文本文件顶部指定的一些信息(元数据)。 Eg: 例如:

Title: This is the title
Author: This is the author
Date: This is the date

and convert it into LaTeX format. 并将其转换为LaTeX格式。 Eg: 例如:

\title{This is the title}
\author{This is the author}
\date{This is the date}

Maybe I'm tackling this the wrong way. 也许我正在以错误的方式解决这个问题。 If there's a better way than regular expressions please let me know! 如果有比正则表达更好的方式,请告诉我!

Thanks! 谢谢!

Update: Thanks for posting some example code in your answers! 更新:感谢您在答案中发布一些示例代码! I can get it to work so long as I replace the findreplace action, but I can't get both to work. 只要我替换了findreplace动作,我就可以让它工作,但我不能让它们都工作。 The problem now is I can't integrate it properly into the code I've got. 现在的问题是我无法将其正确地集成到我所拥有的代码中。 How would I go about having the script do multiple actions on 'outtext' in the below snippet? 如何让脚本在下面的代码段中对'outtext'执行多项操作?

for couple in findreplace:
    outtext=s.replace(couple[0],couple[1])
    s=outtext
>>> import re
>>> s = """Title: This is the title
... Author: This is the author
... Date: This is the date"""
>>> p = re.compile(r'^(\w+):\s*(.+)$', re.M)
>>> print p.sub(r'\\\1{\2}', s)
\Title{This is the title}
\Author{This is the author}
\Date{This is the date}

To change the case, use a function as replace parameter: 要更改大小写,请使用函数作为替换参数:

def repl_cb(m):
    return "\\%s{%s}" %(m.group(1).lower(), m.group(2))

p = re.compile(r'^(\w+):\s*(.+)$', re.M)
print p.sub(repl_cb, s)

\\title{This is the title}
\\author{This is the author}
\\date{This is the date}

The regular expression you want would probably be along the lines of this one: 你想要的正则表达式可能就是这一行:

^([^:]+): (.*)

and the replacement expression would be 并且替换表达式将是

\\\1{\2}
>>> import re
>>> m = 'title', 'author', 'date'
>>> s = """Title: This is the title
Author: This is the author
Date: This is the date"""
>>> for i in m:
    s = re.compile(i+': (.*)', re.I).sub(r'\\' + i + r'{\1}', s)


>>> print(s)
\title{This is the title}
\author{This is the author}
\date{This is the date}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM