如何用python替換sed之類的文本？

Question

我想啟用此文件中的所有 apt 存儲庫

cat /etc/apt/sources.list
## Note, this file is written by cloud-init on first boot of an instance                                                                                                            
## modifications made here will not survive a re-bundle.                                                                                                                            
## if you wish to make changes you can:                                                                                                                                             
## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg                                                                                                                
##     or do the same in user-data
## b.) add sources in /etc/apt/sources.list.d                                                                                                                                       
#                                                                                                                                                                                   

# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to                                                                                                           
# newer versions of the distribution.                                                                                                                                               
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick main                                                                                                                   
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick main                                                                                                               

## Major bug fix updates produced after the final release of the                                                                                                                    
## distribution.                                                                                                                                                                    
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates main                                                                                                           
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates main                                                                                                       

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu                                                                                                         
## team. Also, please note that software in universe WILL NOT receive any                                                                                                           
## review or updates from the Ubuntu security team.                                                                                                                                 
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick universe                                                                                                               
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick universe                                                                                                           
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates universe
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates universe

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu 
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in 
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick multiverse
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates multiverse

## Uncomment the following two lines to add software from the 'backports'
## repository.
## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-backports main restricted universe multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-backports main restricted universe multiverse

## Uncomment the following two lines to add software from Canonical's
## 'partner' repository.
## This software is not part of Ubuntu, but is offered by Canonical and the
## respective vendors as a service to Ubuntu users.
# deb http://archive.canonical.com/ubuntu maverick partner
# deb-src http://archive.canonical.com/ubuntu maverick partner

deb http://security.ubuntu.com/ubuntu maverick-security main
deb-src http://security.ubuntu.com/ubuntu maverick-security main
deb http://security.ubuntu.com/ubuntu maverick-security universe
deb-src http://security.ubuntu.com/ubuntu maverick-security universe
# deb http://security.ubuntu.com/ubuntu maverick-security multiverse
# deb-src http://security.ubuntu.com/ubuntu maverick-security multiverse

使用 sed 這是一個簡單的sed -i 's/^# deb/deb/' /etc/apt/sources.list最優雅（“pythonic”）的方法是什么？

Answer 1

你可以這樣做：

with open("/etc/apt/sources.list", "r") as sources:
    lines = sources.readlines()
with open("/etc/apt/sources.list", "w") as sources:
    for line in lines:
        sources.write(re.sub(r'^# deb', 'deb', line))

with 語句確保文件正確關閉，並在"w"模式下重新打開文件會在寫入文件之前清空文件。 re.sub(pattern, replace, string) 相當於 sed/perl 中的 s/pattern/replace/。

編輯：示例中的固定語法

Answer 2

在沒有外部命令或額外依賴的情況下，用純 Python 編寫一個自產的sed替代品是一項充滿高尚地雷的崇高任務。 誰曾想到？

盡管如此，這是可行的。 這也是可取的。 我們都去過那里，人們：“我需要處理一些純文本文件，但我只有 Python、兩條塑料鞋帶和一罐發霉的地堡級馬拉斯基諾櫻桃。幫助。”

在這個答案中，我們提供了一個同類最佳的解決方案，將先前答案的精彩拼湊在一起，而沒有所有令人不快的不-真棒。 正如 plundra 指出的那樣，大衛米勒的其他一流答案非原子地寫入所需的文件，因此會引發競爭條件（例如，來自其他線程和/或嘗試同時讀取該文件的進程）。 那不好。 Plundra 的其他優秀答案解決了這個問題，同時引入了更多問題——包括許多致命的編碼錯誤、一個嚴重的安全漏洞（未能保留原始文件的權限和其他元數據），以及用低級字符索引替換正則表達式的過早優化。 那也不好。

厲害了，團結起來！

import re, shutil, tempfile

def sed_inplace(filename, pattern, repl):
    '''
    Perform the pure-Python equivalent of in-place `sed` substitution: e.g.,
    `sed -i -e 's/'${pattern}'/'${repl}' "${filename}"`.
    '''
    # For efficiency, precompile the passed regular expression.
    pattern_compiled = re.compile(pattern)

    # For portability, NamedTemporaryFile() defaults to mode "w+b" (i.e., binary
    # writing with updating). This is usually a good thing. In this case,
    # however, binary writing imposes non-trivial encoding constraints trivially
    # resolved by switching to text writing. Let's do that.
    with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
        with open(filename) as src_file:
            for line in src_file:
                tmp_file.write(pattern_compiled.sub(repl, line))

    # Overwrite the original file with the munged temporary file in a
    # manner preserving file attributes (e.g., permissions).
    shutil.copystat(filename, tmp_file.name)
    shutil.move(tmp_file.name, filename)

# Do it for Johnny.
sed_inplace('/etc/apt/sources.list', r'^\# deb', 'deb')

Answer 3

massedit.py ( http://github.com/elmotec/massedit ) 為您提供腳手架，只需要編寫正則表達式。 它仍處於測試階段，但我們正在尋找反饋。

python -m massedit -e "re.sub(r'^# deb', 'deb', line)" /etc/apt/sources.list

將以差異格式顯示差異（之前/之后）。

添加 -w 選項以將更改寫入原始文件：

python -m massedit -e "re.sub(r'^# deb', 'deb', line)" -w /etc/apt/sources.list

或者，您現在可以使用 api：

>>> import massedit
>>> filenames = ['/etc/apt/sources.list']
>>> massedit.edit_files(filenames, ["re.sub(r'^# deb', 'deb', line)"], dry_run=True)

Answer 4

這是一種不同的方法，我不想編輯我的其他答案。 嵌套with因為我不使用 3.1（其中with A() as a, B() as b:工作）。

更改sources.list 可能有點矯枉過正，但我想把它放在那里以供將來搜索。

#!/usr/bin/env python
from shutil   import move
from tempfile import NamedTemporaryFile

with NamedTemporaryFile(delete=False) as tmp_sources:
    with open("sources.list") as sources_file:
        for line in sources_file:
            if line.startswith("# deb"):
                tmp_sources.write(line[2:])
            else:
                tmp_sources.write(line)

move(tmp_sources.name, sources_file.name)

這應該確保沒有其他人閱讀文件的競爭條件。 哦，我更喜歡 str.startswith(...) 當你可以不用正則表達式時。

Answer 5

如果您使用的是 Python3，以下模塊將幫助您： https : //github.com/mahmoudadel2/pysed

wget https://raw.githubusercontent.com/mahmoudadel2/pysed/master/pysed.py

將模塊文件放入你的 Python3 模塊路徑中，然后：

import pysed
pysed.replace(<Old string>, <Replacement String>, <Text File>)
pysed.rmlinematch(<Unwanted string>, <Text File>)
pysed.rmlinenumber(<Unwanted Line Number>, <Text File>)

Answer 6

嘗試pysed ：

pysed -r '# deb' 'deb' /etc/apt/sources.list

Answer 7

如果您真的想在不安裝新 Python 模塊的情況下使用sed命令，您可以簡單地執行以下操作：

import subprocess
subprocess.call("sed command")

Answer 8

不確定優雅，但這至少應該是相當可讀的。 對於sources.list，可以事先閱讀所有行，對於更大的內容，您可能希望在循環時“就地”更改。

#!/usr/bin/env python
# Open file for reading and writing
with open("sources.list", "r+") as sources_file:
    # Read all the lines
    lines = sources_file.readlines()

    # Rewind and truncate
    sources_file.seek(0)
    sources_file.truncate()

    # Loop through the lines, adding them back to the file.
    for line in lines:
        if line.startswith("# deb"):
            sources_file.write(line[2:])
        else:
            sources_file.write(line)

編輯： with -statement 一起使用with獲得更好的文件處理。 之前截斷之前也忘了倒帶。

Answer 9

你可以這樣做：

p = re.compile("^\# *deb", re.MULTILINE)
text = open("sources.list", "r").read()
f = open("sources.list", "w")
f.write(p.sub("deb", text))
f.close()

或者（恕我直言，這從組織的角度來看更好）您可以將您的sources.list分成幾部分（一個條目/一個存儲庫）並將它們放在/etc/apt/sources.list.d/

Answer 10

Cecil Curry有一個很好的答案，但是他的答案僅適用於多行正則表達式。 多行正則表達式很少使用，但有時也很方便。

這是對他的 sed_inplace 函數的改進，如果需要，它允許它使用多行正則表達式運行。

警告：在多行模式下，它將讀取整個文件，然后執行正則表達式替換，因此您只想在小型文件上使用此模式 - 不要嘗試在千兆字節大小的文件上運行它在多行模式下運行時。

import re, shutil, tempfile

def sed_inplace(filename, pattern, repl, multiline = False):
    '''
    Perform the pure-Python equivalent of in-place `sed` substitution: e.g.,
    `sed -i -e 's/'${pattern}'/'${repl}' "${filename}"`.
    '''
    re_flags = 0
    if multiline:
        re_flags = re.M

    # For efficiency, precompile the passed regular expression.
    pattern_compiled = re.compile(pattern, re_flags)

    # For portability, NamedTemporaryFile() defaults to mode "w+b" (i.e., binary
    # writing with updating). This is usually a good thing. In this case,
    # however, binary writing imposes non-trivial encoding constraints trivially
    # resolved by switching to text writing. Let's do that.
    with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
        with open(filename) as src_file:
            if multiline:
                content = src_file.read()
                tmp_file.write(pattern_compiled.sub(repl, content))
            else:
                for line in src_file:
                    tmp_file.write(pattern_compiled.sub(repl, line))

    # Overwrite the original file with the munged temporary file in a
    # manner preserving file attributes (e.g., permissions).
    shutil.copystat(filename, tmp_file.name)
    shutil.move(tmp_file.name, filename)

from os.path import expanduser
sed_inplace('%s/.gitconfig' % expanduser("~"), r'^(\[user\]$\n[ \t]*name = ).*$(\n[ \t]*email = ).*', r'\1John Doe\2jdoe@example.com', multiline=True)

Answer 11

如果我想要像sed 這樣的東西，那么我通常只使用sh庫調用sed本身。

from sh import sed

sed(['-i', 's/^# deb/deb/', '/etc/apt/sources.list'])

當然，也有缺點。 就像本地安裝的sed版本可能與您測試的版本不同。 在我的情況下，這種事情可以在另一層輕松處理（例如通過預先檢查目標環境，或使用已知版本的 sed 在 docker 映像中部署）。

Answer 12

這是perl -p單模塊 Python 替換：

# Provide compatibility with `perl -p`

# Usage:
#
#     python -mloop_over_stdin_lines '<program>'

# In, `<program>`, use the variable `line` to read and change the current line.

# Example:
#
#         python -mloop_over_stdin_lines 'line = re.sub("pattern", "replacement", line)'

# From the perlrun documentation:
#
#        -p   causes Perl to assume the following loop around your
#             program, which makes it iterate over filename arguments
#             somewhat like sed:
# 
#               LINE:
#                 while (<>) {
#                     ...             # your program goes here
#                 } continue {
#                     print or die "-p destination: $!\n";
#                 }
# 
#             If a file named by an argument cannot be opened for some
#             reason, Perl warns you about it, and moves on to the next
#             file. Note that the lines are printed automatically. An
#             error occurring during printing is treated as fatal. To
#             suppress printing use the -n switch. A -p overrides a -n
#             switch.
# 
#             "BEGIN" and "END" blocks may be used to capture control
#             before or after the implicit loop, just as in awk.
# 

import re
import sys

for line in sys.stdin:
    exec(sys.argv[1], globals(), locals())
    try:
        print line,
    except:
        sys.exit('-p destination: $!\n')

Answer 13

我希望能夠查找和替換文本，而且還希望在我插入的內容中包含匹配的組。 我寫了這個簡短的腳本來做到這一點：

https://gist.github.com/turtlemonvh/0743a1c63d1d27df3f17

其關鍵組件如下所示：

print(re.sub(pattern, template, text).rstrip("\n"))

這是一個如何工作的例子：

# Find everything that looks like 'dog' or 'cat' followed by a space and a number
pattern = "((cat|dog) (\d+))"

# Replace with 'turtle' and the number. '3' because the number is the 3rd matched group.
# The double '\' is needed because you need to escape '\' when running this in a python shell
template = "turtle \\3"

# The text to operate on
text = "cat 976 is my favorite"

用這個調用上面的函數會產生：

turtle 976 is my favorite

Answer 14

[以上所有答案均無效！]

我在一個大約 1000 行的文件中有多個鍵值替換的情況。 替換后文件結構應保持不變。 例如：

key1=value_tobe_replaced1
key2=value_tobe_replaced1
.     .
.     .
key1000=value_tobe_replaced1000

我試過：

@elmotec 對 massedit 的投票答案。
來自@Cecil Curry 的回答。
來自@Keithel 的回答。

這三個答案肯定對我有很大幫助，但經過測試，我發現第一次和第二次的成本接近 40-50 秒。 3rd 不適合多次更換，所以我修復了它。

注意：在繼續之前請參閱答案。

這是我的代碼：

換行方式：

start_time = datetime.datetime.now()
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
    with open(abs_keypair_file) as kf:
        for line in kf:
            line_to_write = ''
            match_flag = False
            for (key, value) in tuple_list:
                # print '  %s = %r' % (key, value)
                if  not re.search(patten, line, flags=re.I):
                    continue
                line_to_write = re.sub(r'\$\({}\)'.format(key), value, line, flags=re.I)
                match_flag = True

            if not match_flag:
                line_to_write = line
            tmp_file.write(line_to_write)

shutil.copystat(abs_keypair_file, tmp_file.name)
shutil.move(tmp_file.name, abs_keypair_file)

time_costs = datetime.datetime.now() - start_time
print 'time costs: %s' % time_costs

time costs: 0:00:42.533879

文件替換模式：

start_time = datetime.datetime.now()
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
    with open(abs_keypair_file) as kf:
        text = kf.read()
        for (key, value) in tuple_list:
            text = re.sub(patten, value, text, flags=re.M|re.I)
        tmp_file.write(text)
shutil.copystat(abs_keypair_file, tmp_file.name)
shutil.move(tmp_file.name, abs_keypair_file)

time_costs = datetime.datetime.now() - start_time
print 'time costs: %s' % time_costs

time costs: 0:00:00.348458

所以我建議如果你符合我的情況並且你的文件不是太大，你可以遵循file replacement mode 。

如果文件很大，如何替換？ 我不知道。

希望這可以幫助。

Answer 15

Python有一個正則表達式模塊（import re）。 為什么你不想像在perl中那樣使用它。 它具有perl正則表達式的所有功能

如何用python替換sed之類的文本？

問題描述

14 個解決方案

解決方案1
62 2010-12-13 10:22:35

解決方案2
33 2015-07-19 07:56:08

解決方案3
26 已采納

解決方案4
12 2010-12-13 11:31:18

解決方案5
6 2014-04-29 11:35:48

解決方案6
3 2014-06-23 07:35:06

解決方案7
2 2014-06-23 20:29:57

解決方案8
2 2010-12-13 10:11:38

解決方案9
2 2010-12-13 10:51:00

解決方案10
2 2019-08-02 20:12:55

解決方案11
2 2020-01-30 21:02:05

解決方案12
1 2013-06-19 17:31:09

解決方案13
1 2016-02-02 15:43:43

解決方案14
1 2020-04-29 09:31:40

[以上所有答案均無效！]

換行方式：

文件替換模式：

解決方案15
0 2010-12-13 10:03:54

如何用python替換sed之類的文本？

問題描述

14 個解決方案

解決方案1 62 2010-12-13 10:22:35

解決方案2 33 2015-07-19 07:56:08

解決方案3 26 已采納

解決方案4 12 2010-12-13 11:31:18

解決方案5 6 2014-04-29 11:35:48

解決方案6 3 2014-06-23 07:35:06

解決方案7 2 2014-06-23 20:29:57

解決方案8 2 2010-12-13 10:11:38

解決方案9 2 2010-12-13 10:51:00

解決方案10 2 2019-08-02 20:12:55

解決方案11 2 2020-01-30 21:02:05

解決方案12 1 2013-06-19 17:31:09

解決方案13 1 2016-02-02 15:43:43

解決方案14 1 2020-04-29 09:31:40

[以上所有答案均無效！]

換行方式：

文件替換模式：

解決方案15 0 2010-12-13 10:03:54

解決方案1
62 2010-12-13 10:22:35

解決方案2
33 2015-07-19 07:56:08

解決方案3
26 已采納

解決方案4
12 2010-12-13 11:31:18

解決方案5
6 2014-04-29 11:35:48

解決方案6
3 2014-06-23 07:35:06

解決方案7
2 2014-06-23 20:29:57

解決方案8
2 2010-12-13 10:11:38

解決方案9
2 2010-12-13 10:51:00

解決方案10
2 2019-08-02 20:12:55

解決方案11
2 2020-01-30 21:02:05

解決方案12
1 2013-06-19 17:31:09

解決方案13
1 2016-02-02 15:43:43

解決方案14
1 2020-04-29 09:31:40

解決方案15
0 2010-12-13 10:03:54