简体   繁体   English

如何用python替换sed之类的文本?

[英]How to do sed like text replace with python?

I would like to enable all apt repositories in this file我想启用此文件中的所有 apt 存储库

cat /etc/apt/sources.list
## Note, this file is written by cloud-init on first boot of an instance                                                                                                            
## modifications made here will not survive a re-bundle.                                                                                                                            
## if you wish to make changes you can:                                                                                                                                             
## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg                                                                                                                
##     or do the same in user-data
## b.) add sources in /etc/apt/sources.list.d                                                                                                                                       
#                                                                                                                                                                                   

# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to                                                                                                           
# newer versions of the distribution.                                                                                                                                               
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick main                                                                                                                   
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick main                                                                                                               

## Major bug fix updates produced after the final release of the                                                                                                                    
## distribution.                                                                                                                                                                    
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates main                                                                                                           
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates main                                                                                                       

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu                                                                                                         
## team. Also, please note that software in universe WILL NOT receive any                                                                                                           
## review or updates from the Ubuntu security team.                                                                                                                                 
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick universe                                                                                                               
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick universe                                                                                                           
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates universe
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates universe

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu 
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in 
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick multiverse
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates multiverse

## Uncomment the following two lines to add software from the 'backports'
## repository.
## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-backports main restricted universe multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-backports main restricted universe multiverse

## Uncomment the following two lines to add software from Canonical's
## 'partner' repository.
## This software is not part of Ubuntu, but is offered by Canonical and the
## respective vendors as a service to Ubuntu users.
# deb http://archive.canonical.com/ubuntu maverick partner
# deb-src http://archive.canonical.com/ubuntu maverick partner

deb http://security.ubuntu.com/ubuntu maverick-security main
deb-src http://security.ubuntu.com/ubuntu maverick-security main
deb http://security.ubuntu.com/ubuntu maverick-security universe
deb-src http://security.ubuntu.com/ubuntu maverick-security universe
# deb http://security.ubuntu.com/ubuntu maverick-security multiverse
# deb-src http://security.ubuntu.com/ubuntu maverick-security multiverse

With sed this is a simple sed -i 's/^# deb/deb/' /etc/apt/sources.list what's the most elegant ("pythonic") way to do this?使用 sed 这是一个简单的sed -i 's/^# deb/deb/' /etc/apt/sources.list最优雅(“pythonic”)的方法是什么?

You can do that like this:你可以这样做:

with open("/etc/apt/sources.list", "r") as sources:
    lines = sources.readlines()
with open("/etc/apt/sources.list", "w") as sources:
    for line in lines:
        sources.write(re.sub(r'^# deb', 'deb', line))

The with statement ensures that the file is closed correctly, and re-opening the file in "w" mode empties the file before you write to it. with 语句确保文件正确关闭,并在"w"模式下重新打开文件会在写入文件之前清空文件。 re.sub(pattern, replace, string) is the equivalent of s/pattern/replace/ in sed/perl. re.sub(pattern, replace, string) 相当于 sed/perl 中的 s/pattern/replace/。

Edit: fixed syntax in example编辑:示例中的固定语法

Authoring a homegrown sed replacement in pure Python with no external commands or additional dependencies is a noble task laden with noble landmines.没有外部命令或额外依赖的情况下,用纯 Python 编写一个自产的sed替代品是一项充满高尚地雷的崇高任务。 Who would have thought?谁曾想到?

Nonetheless, it is feasible.尽管如此,这是可行的。 It's also desirable.这也是可取的。 We've all been there, people: "I need to munge some plaintext files, but I only have Python, two plastic shoelaces, and a moldy can of bunker-grade Maraschino cherries. Help."我们都去过那里,人们:“我需要处理一些纯文本文件,但我只有 Python、两条塑料鞋带和一罐发霉的地堡级马拉斯基诺樱桃。帮助。”

In this answer, we offer a best-of-breed solution cobbling together the awesomeness of prior answers without all of that unpleasant not -awesomeness.在这个答案中,我们提供了一个同类最佳的解决方案,将先前答案的精彩拼凑在一起,而没有所有令人不快的不-真棒。 As plundra notes, David Miller's otherwise top-notch answer writes the desired file non-atomically and hence invites race conditions (eg, from other threads and/or processes attempting to concurrently read that file).正如 plundra 指出的那样,大卫米勒的其他一流答案非原子地写入所需的文件,因此会引发竞争条件(例如,来自其他线程和/或尝试同时读取该文件的进程)。 That's bad.那不好。 Plundra's otherwise excellent answer solves that issue while introducing yet more – including numerous fatal encoding errors, a critical security vulnerability (failing to preserve the permissions and other metadata of the original file), and premature optimization replacing regular expressions with low-level character indexing. Plundra 的其他优秀答案解决了这个问题,同时引入了更多问题——包括许多致命的编码错误、一个严重的安全漏洞(未能保留原始文件的权限和其他元数据),以及用低级字符索引替换正则表达式的过早优化。 That's also bad.那也不好。

Awesomeness, unite!厉害了,团结起来!

import re, shutil, tempfile

def sed_inplace(filename, pattern, repl):
    '''
    Perform the pure-Python equivalent of in-place `sed` substitution: e.g.,
    `sed -i -e 's/'${pattern}'/'${repl}' "${filename}"`.
    '''
    # For efficiency, precompile the passed regular expression.
    pattern_compiled = re.compile(pattern)

    # For portability, NamedTemporaryFile() defaults to mode "w+b" (i.e., binary
    # writing with updating). This is usually a good thing. In this case,
    # however, binary writing imposes non-trivial encoding constraints trivially
    # resolved by switching to text writing. Let's do that.
    with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
        with open(filename) as src_file:
            for line in src_file:
                tmp_file.write(pattern_compiled.sub(repl, line))

    # Overwrite the original file with the munged temporary file in a
    # manner preserving file attributes (e.g., permissions).
    shutil.copystat(filename, tmp_file.name)
    shutil.move(tmp_file.name, filename)

# Do it for Johnny.
sed_inplace('/etc/apt/sources.list', r'^\# deb', 'deb')

massedit.py ( http://github.com/elmotec/massedit ) does the scaffolding for you leaving just the regex to write. massedit.py ( http://github.com/elmotec/massedit ) 为您提供脚手架,只需要编写正则表达式。 It's still in beta but we are looking for feedback.它仍处于测试阶段,但我们正在寻找反馈。

python -m massedit -e "re.sub(r'^# deb', 'deb', line)" /etc/apt/sources.list

will show the differences (before/after) in diff format.将以差异格式显示差异(之前/之后)。

Add the -w option to write the changes to the original file:添加 -w 选项以将更改写入原始文件:

python -m massedit -e "re.sub(r'^# deb', 'deb', line)" -w /etc/apt/sources.list

Alternatively, you can now use the api:或者,您现在可以使用 api:

>>> import massedit
>>> filenames = ['/etc/apt/sources.list']
>>> massedit.edit_files(filenames, ["re.sub(r'^# deb', 'deb', line)"], dry_run=True)

This is such a different approach, I don't want to edit my other answer.这是一种不同的方法,我不想编辑我的其他答案。 Nested with since I don't use 3.1 (Where with A() as a, B() as b: works).嵌套with因为我不使用 3.1(其中with A() as a, B() as b:工作)。

Might be a bit overkill to change sources.list, but I want to put it out there for future searches.更改sources.list 可能有点矫枉过正,但我​​想把它放在那里以供将来搜索。

#!/usr/bin/env python
from shutil   import move
from tempfile import NamedTemporaryFile

with NamedTemporaryFile(delete=False) as tmp_sources:
    with open("sources.list") as sources_file:
        for line in sources_file:
            if line.startswith("# deb"):
                tmp_sources.write(line[2:])
            else:
                tmp_sources.write(line)

move(tmp_sources.name, sources_file.name)

This should ensure no race conditions of other people reading the file.这应该确保没有其他人阅读文件的竞争条件。 Oh, and I prefer str.startswith(...) when you can do without a regexp.哦,我更喜欢 str.startswith(...) 当你可以不用正则表达式时。

If you are using Python3 the following module will help you: https://github.com/mahmoudadel2/pysed如果您使用的是 Python3,以下模块将帮助您: https : //github.com/mahmoudadel2/pysed

wget https://raw.githubusercontent.com/mahmoudadel2/pysed/master/pysed.py

Place the module file into your Python3 modules path, then:将模块文件放入你的 Python3 模块路径中,然后:

import pysed
pysed.replace(<Old string>, <Replacement String>, <Text File>)
pysed.rmlinematch(<Unwanted string>, <Text File>)
pysed.rmlinenumber(<Unwanted Line Number>, <Text File>)

尝试pysed

pysed -r '# deb' 'deb' /etc/apt/sources.list

If you really want to use a sed command without installing a new Python module, you could simply do the following:如果您真的想在不安装新 Python 模块的情况下使用sed命令,您可以简单地执行以下操作:

import subprocess
subprocess.call("sed command")

Not sure about elegant, but this ought to be pretty readable at least.不确定优雅,但这至少应该是相当可读的。 For a sources.list it's fine to read all the lines before hand, for something larger you might want to change "in place" while looping through it.对于sources.list,可以事先阅读所有行,对于更大的内容,您可能希望在循环时“就地”更改。

#!/usr/bin/env python
# Open file for reading and writing
with open("sources.list", "r+") as sources_file:
    # Read all the lines
    lines = sources_file.readlines()

    # Rewind and truncate
    sources_file.seek(0)
    sources_file.truncate()

    # Loop through the lines, adding them back to the file.
    for line in lines:
        if line.startswith("# deb"):
            sources_file.write(line[2:])
        else:
            sources_file.write(line)

EDIT : Use with -statement for better file-handling.编辑with -statement 一起使用with获得更好的文件处理。 Also forgot to rewind before truncate before.之前截断之前也忘了倒带。

You could do something like:你可以这样做:

p = re.compile("^\# *deb", re.MULTILINE)
text = open("sources.list", "r").read()
f = open("sources.list", "w")
f.write(p.sub("deb", text))
f.close()

Alternatively (imho, this is better from organizational standpoint) you could split your sources.list into pieces (one entry/one repository) and place them under /etc/apt/sources.list.d/或者(恕我直言,这从组织的角度来看更好)您可以将您的sources.list分成几部分(一个条目/一个存储库)并将它们放在/etc/apt/sources.list.d/

Cecil Curry has a great answer, however his answer only works for multiline regular expressions. Cecil Curry有一个很好的答案,但是他的答案仅适用于多行正则表达式。 Multiline regular expressions are more rarely used, but they are handy sometimes.多行正则表达式很少使用,但有时也很方便。

Here is an improvement upon his sed_inplace function that allows it to function with multiline regular expressions if asked to do so.这是对他的 sed_inplace 函数的改进,如果需要,它允许它使用多行正则表达式运行。

WARNING: In multiline mode, it will read the entire file in, and then perform the regular expression substitution, so you'll only want to use this mode on small-ish files - don't try to run this on gigabyte-sized files when running in multiline mode.警告:在多行模式下,它将读取整个文件,然后执行正则表达式替换,因此您只想在小型文件上使用此模式 - 不要尝试在千兆字节大小的文件上运行它在多行模式下运行时。

import re, shutil, tempfile

def sed_inplace(filename, pattern, repl, multiline = False):
    '''
    Perform the pure-Python equivalent of in-place `sed` substitution: e.g.,
    `sed -i -e 's/'${pattern}'/'${repl}' "${filename}"`.
    '''
    re_flags = 0
    if multiline:
        re_flags = re.M

    # For efficiency, precompile the passed regular expression.
    pattern_compiled = re.compile(pattern, re_flags)

    # For portability, NamedTemporaryFile() defaults to mode "w+b" (i.e., binary
    # writing with updating). This is usually a good thing. In this case,
    # however, binary writing imposes non-trivial encoding constraints trivially
    # resolved by switching to text writing. Let's do that.
    with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
        with open(filename) as src_file:
            if multiline:
                content = src_file.read()
                tmp_file.write(pattern_compiled.sub(repl, content))
            else:
                for line in src_file:
                    tmp_file.write(pattern_compiled.sub(repl, line))

    # Overwrite the original file with the munged temporary file in a
    # manner preserving file attributes (e.g., permissions).
    shutil.copystat(filename, tmp_file.name)
    shutil.move(tmp_file.name, filename)

from os.path import expanduser
sed_inplace('%s/.gitconfig' % expanduser("~"), r'^(\[user\]$\n[ \t]*name = ).*$(\n[ \t]*email = ).*', r'\1John Doe\2jdoe@example.com', multiline=True)

If I want something like sed, then I usually just call sed itself using the sh library.如果我想要sed 这样的东西,那么我通常只使用sh库调用sed本身。

from sh import sed

sed(['-i', 's/^# deb/deb/', '/etc/apt/sources.list'])

Sure, there are downsides.当然,也有缺点。 Like maybe the locally installed version of sed isn't the same as the one you tested with.就像本地安装的sed版本可能与您测试的版本不同。 In my cases, this kind of thing can be easily handled at another layer (like by examining the target environment beforehand, or deploying in a docker image with a known version of sed).在我的情况下,这种事情可以在另一层轻松处理​​(例如通过预先检查目标环境,或使用已知版本的 sed 在 docker 映像中部署)。

Here's a one-module Python replacement for perl -p :这是perl -p单模块 Python 替换:

# Provide compatibility with `perl -p`

# Usage:
#
#     python -mloop_over_stdin_lines '<program>'

# In, `<program>`, use the variable `line` to read and change the current line.

# Example:
#
#         python -mloop_over_stdin_lines 'line = re.sub("pattern", "replacement", line)'

# From the perlrun documentation:
#
#        -p   causes Perl to assume the following loop around your
#             program, which makes it iterate over filename arguments
#             somewhat like sed:
# 
#               LINE:
#                 while (<>) {
#                     ...             # your program goes here
#                 } continue {
#                     print or die "-p destination: $!\n";
#                 }
# 
#             If a file named by an argument cannot be opened for some
#             reason, Perl warns you about it, and moves on to the next
#             file. Note that the lines are printed automatically. An
#             error occurring during printing is treated as fatal. To
#             suppress printing use the -n switch. A -p overrides a -n
#             switch.
# 
#             "BEGIN" and "END" blocks may be used to capture control
#             before or after the implicit loop, just as in awk.
# 

import re
import sys

for line in sys.stdin:
    exec(sys.argv[1], globals(), locals())
    try:
        print line,
    except:
        sys.exit('-p destination: $!\n')

I wanted to be able to find and replace text but also include matched groups in the content I insert.我希望能够查找和替换文本,而且还希望在我插入的内容中包含匹配的组。 I wrote this short script to do that:我写了这个简短的脚本来做到这一点:

https://gist.github.com/turtlemonvh/0743a1c63d1d27df3f17 https://gist.github.com/turtlemonvh/0743a1c63d1d27df3f17

The key component of that is something that looks like like this:其关键组件如下所示:

print(re.sub(pattern, template, text).rstrip("\n"))

Here's an example of how that works:这是一个如何工作的例子:

# Find everything that looks like 'dog' or 'cat' followed by a space and a number
pattern = "((cat|dog) (\d+))"

# Replace with 'turtle' and the number. '3' because the number is the 3rd matched group.
# The double '\' is needed because you need to escape '\' when running this in a python shell
template = "turtle \\3"

# The text to operate on
text = "cat 976 is my favorite"

Calling the above function with this yields:用这个调用上面的函数会产生:

turtle 976 is my favorite

[None of the answers works properly above !] [以上所有答案均无效!]

I have a case of multiple key-value replacement in one file around 1000 lines.我在一个大约 1000 行的文件中有多个键值替换的情况。 And after replacement the file structure should keep the same.替换后文件结构应保持不变。 for example:例如:

key1=value_tobe_replaced1
key2=value_tobe_replaced1
.     .
.     .
key1000=value_tobe_replaced1000

I've tried:我试过:

  1. the voted answer from @elmotec for massedit. @elmotec 对 massedit 的投票答案。

  2. answer from @Cecil Curry.来自@Cecil Curry 的回答。

  3. answer from @Keithel.来自@Keithel 的回答。

The three answers definitely helped me a lot but after test I found it costs nearly 40-50s for 1st and 2ed.这三个答案肯定对我有很大帮助,但经过测试,我发现第一次和第二次的成本接近 40-50 秒。 3rd is not suitable for multi-replacement so I fixed it. 3rd 不适合多次更换,所以我修复了它。

Notice : refer to the answers before go on.注意:在继续之前请参阅答案。

Here's my code:这是我的代码:

Line replacement mode:换行方式:

start_time = datetime.datetime.now()
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
    with open(abs_keypair_file) as kf:
        for line in kf:
            line_to_write = ''
            match_flag = False
            for (key, value) in tuple_list:
                # print '  %s = %r' % (key, value)
                if  not re.search(patten, line, flags=re.I):
                    continue
                line_to_write = re.sub(r'\$\({}\)'.format(key), value, line, flags=re.I)
                match_flag = True

            if not match_flag:
                line_to_write = line
            tmp_file.write(line_to_write)

shutil.copystat(abs_keypair_file, tmp_file.name)
shutil.move(tmp_file.name, abs_keypair_file)

time_costs = datetime.datetime.now() - start_time
print 'time costs: %s' % time_costs
time costs: 0:00:42.533879

file replacement mode:文件替换模式:

start_time = datetime.datetime.now()
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
    with open(abs_keypair_file) as kf:
        text = kf.read()
        for (key, value) in tuple_list:
            text = re.sub(patten, value, text, flags=re.M|re.I)
        tmp_file.write(text)
shutil.copystat(abs_keypair_file, tmp_file.name)
shutil.move(tmp_file.name, abs_keypair_file)

time_costs = datetime.datetime.now() - start_time
print 'time costs: %s' % time_costs
time costs: 0:00:00.348458

So I suggest if you match my case and your file size is not too large you may follow file replacement mode .所以我建议如果你符合我的情况并且你的文件不是太大,你可以遵循file replacement mode

How to replace if file size is huge?如果文件很大,如何替换? I have no idea.我不知道。

Hope this helps.希望这可以帮助。

Python has got a regex module (import re) . Python有一个正则表达式模块(import re)。 why you dont want to use it as done in perl. 为什么你不想像在perl中那样使用它。 It has got all the features of a perl regex 它具有perl正则表达式的所有功能

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM