python中的“ sed -f” bash命令等效

Question

I have a set of regular expressions for substitution in a file (sed.clean) as follow: 我在文件（sed.clean）中有一组用于替换的正则表达式，如下所示：

#!/bin/sed -f
s/https\?:\/\/[^ ]*//g
s/\.//g
s/\"//g
s/\,//g
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/

and some more lines like those. 还有更多类似的内容。 I want to use this file for 'clean' a set of text files. 我想使用此文件来“清理”一组文本文件。 To do this in bash I'd do something like this: 为了做到这一点，我会做这样的事情：

for file in $(ls rootDirectory)
do
    sed -f sed.clean $file > OUTPUT_FILE
done

How could I do something similar in Python? 如何在Python中做类似的事情？

What I mean is if it is possible to leverage the n RE which I have in the sed.clean file (or rewrite them in the proper Python format) in order to avoid building a nested loop to compare each file with each RE, and just compare each file with a sed.clean python file as I do in bash. 我的意思是说，是否有可能利用sed.clean文件中的n RE（或以正确的Python格式重写它们）以避免构建嵌套循环以将每个文件与每个RE进行比较，并且像我在bash中一样，将每个文件与sed.clean python文件进行比较。 Something like this: 像这样：

files = [ f for f in listdir(dirPath) if isfile(join(dirPath,f)) ]
for file in files:
    newTextFile = re.sub(sed.clean, file)
    saveTextFile(newTextFile, outputPath)

instead of this: 代替这个：

REs = ['s/https\?:\/\/[^ ]*//g', 's/\.//g',...,'y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/']
files = [ f for f in listdir(dirPath) if isfile(join(dirPath,f)) ]
for file in files:
    for re in REs:
        newTextFile = re.sub(re, '', file)
        saveTextFile(newTextFile, outputPath)

Thanks! 谢谢！

Answer 1

These sed patters appear to blank out lines matching certain patterns from a file. 这些sed模式似乎使文件中与某些模式匹配的行空白。 In python readlines() , filter() and re.sub() would be your best pick. 在python readlines() ， filter()和re.sub()是您的最佳选择。

Answer 2

Try the re.sub like this: 尝试这样的re.sub：

import re
>>> re.compile(r'\.')
<_sre.SRE_Pattern object at 0x9d48c80>
>>> MY_RE = re.compile(r'\.')
>>> MY_RE.sub('','www.google.com')
'wwwgooglecom'

You can compile any regex in re.compile() 您可以在re.compile（）中编译任何正则表达式

Answer 3

You'll have to convert your sed script replacements to Python equivalents. 您必须将sed脚本替换项转换为Python等效项。

s/<pattern>/<replacement>/<flags>
# is equivialent to
re.sub("<pattern>", "<replacement>", <input>, flags=<python-flags>)

Note that this is greedy, so there's no need for /g at the end of the pattern. 请注意，这是贪婪的，因此在模式末尾不需要/g 。 Moreover, you should not include lags in the pattern, as they are passed as a separate parameter . 此外，您不应在模式中包括滞后，因为它们是作为单独的参数传递的。 For example: 例如：

re.sub("\.", "", "a.b.c.d", flags=re.MULTILINE)

y/<pattern>/<replacement>
# is equivivalent to
trans = str.maketrans("<pattern>", "<replacement>")
<input>.translate(trans)

But in the case of y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/ it's just as simple as <input>.lower() . 但是对于y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/它就像<input>.lower()一样简单。

for file in $(ls rootDirectory) is roughly equivivalent to (taken from here ) for file in $(ls rootDirectory)中的文件大致等于（从此处获取）

files = [f for f in os.listdir('<rootDirectory>') if os.path.isfile(f)]
for f in files:
    # do something

All together: 全部一起：

import os # don't forget to import required modules
import re

output_file = open('C:\\temp\\output.txt', 'w')

def process(line):
    result = line
    result = re.sub("\"","", result)
    result = re.sub("\.","", result)
    # do all the stuff your sed script does and than
    return result

files = [f for f in os.listdir('.') if os.path.isfile(f)]
for file in files:
    file_handle = open(file_name, 'r')
    lines = file_handle.readlines()
    processed = map(process, lines)
    for line in processed:
        output_file.write(line)

Refer to the Python documentation for regex and file operations for details. 有关正则表达式和文件操作的信息，请参阅Python文档。

You might want to try converting your sed script to Python automatically, but if it's a one time requirement it's simpler to do it by hand. 您可能想尝试将sed脚本自动转换为Python，但是如果这是一次性的要求，那么手工操作会更简单。

python中的“ sed -f” bash命令等效

问题描述

3 个解决方案

解决方案1
0 2014-04-03 08:35:45

解决方案2
0 2014-04-03 08:37:10

解决方案3
0 2014-04-03 09:03:47

python中的“ sed -f” bash命令等效

问题描述

3 个解决方案

解决方案1 0 2014-04-03 08:35:45

解决方案2 0 2014-04-03 08:37:10

解决方案3 0 2014-04-03 09:03:47

解决方案1
0 2014-04-03 08:35:45

解决方案2
0 2014-04-03 08:37:10

解决方案3
0 2014-04-03 09:03:47