简体   繁体   中英

Replacing words with a dictionary in text file encoded for example in UTF-8

I'm trying to open a text file and then read through it replacing certain strings with strings stored in a dictionary. Based on answers to Replacing words in text file using a dictionary and How to search and replace text in a file using Python?

As like:

# edit print line to print (line) 
import fileinput

text = "sample file.txt"
fields = {"pattern 1": "replacement text 1", "pattern 2": "replacement text 2"}

for line in fileinput.input(text, inplace=True):
    line = line.rstrip()
    for field in fields:
        if field in line:
            line = line.replace(field, fields[field])

    print (line)

My file is encoding in utf-8 .

When I run this, the console shows this error:

UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>

When add: encoding = "utf8" to fileinput.FileInput() its show an error:

TypeError: __init__() got an unexpected keyword argument 'encoding'

When add: openhook=fileinput.hook_encoded("utf8") to fileinput.FileInput() it show error:

ValueError: FileInput cannot use an opening hook in inplace mode

I do not want to insert a subcode 'ignore' ignoring errors.

I have file, dictionary and want replace values from dictionary into file like stdout .

Source file in utf-8 :

Plain text on the line in the file.
This is a greeting to the world.
Hello world!
Here's another plain text.
And here too!

I want to replace the word world with the word earth .

In dictionary: {"world": "earth"}

Modified file in utf-8 :

Plain text on the line in the file.
This is a greeting to the earth.
Hello earth!
Here's another plain text.
And here too!

The fileinput library has several problems that I addressed in the past in a blog post ; one of these is that you can't set the encoding and use in-place file rewriting.

The following code can do this, but you have to replace your print() calls with writes to the outgoing file object:

from contextlib import contextmanager
import io
import os


@contextmanager
def inplace(filename, mode='r', buffering=-1, encoding=None, errors=None,
            newline=None, backup_extension=None):
    """Allow for a file to be replaced with new content.

    yields a tuple of (readable, writable) file objects, where writable
    replaces readable.

    If an exception occurs, the old file is restored, removing the
    written data.

    mode should *not* use 'w', 'a' or '+'; only read-only-modes are supported.

    """

    # move existing file to backup, create new file with same permissions
    # borrowed extensively from the fileinput module
    if set(mode).intersection('wa+'):
        raise ValueError('Only read-only file modes can be used')

    backupfilename = filename + (backup_extension or os.extsep + 'bak')
    try:
        os.unlink(backupfilename)
    except os.error:
        pass
    os.rename(filename, backupfilename)
    readable = io.open(backupfilename, mode, buffering=buffering,
                       encoding=encoding, errors=errors, newline=newline)
    try:
        perm = os.fstat(readable.fileno()).st_mode
    except OSError:
        writable = open(filename, 'w' + mode.replace('r', ''),
                        buffering=buffering, encoding=encoding, errors=errors,
                        newline=newline)
    else:
        os_mode = os.O_CREAT | os.O_WRONLY | os.O_TRUNC
        if hasattr(os, 'O_BINARY'):
            os_mode |= os.O_BINARY
        fd = os.open(filename, os_mode, perm)
        writable = io.open(fd, "w" + mode.replace('r', ''), buffering=buffering,
                           encoding=encoding, errors=errors, newline=newline)
        try:
            if hasattr(os, 'chmod'):
                os.chmod(filename, perm)
        except OSError:
            pass
    try:
        yield readable, writable
    except Exception:
        # move backup back
        try:
            os.unlink(filename)
        except os.error:
            pass
        os.rename(backupfilename, filename)
        raise
    finally:
        readable.close()
        writable.close()
        try:
            os.unlink(backupfilename)
        except os.error:
            pass

So your code would look like:

import fileinput

text = "sample file.txt"
fields = {"pattern 1": "replacement text 1", "pattern 2": "replacement text 2"}

with inplace(text, encoding='utf8') as (infh, outfh):
    for line in infh:
        for field in fields:
            if field in line:
                line = line.replace(field, fields[field])

        outfh.write(line)

Note that you don't have to remove the newline now.

I tried to use this:

with open(fileName1, "r+", encoding = "utf8", newline='') as fileIn, open(fileName1, "r+", encoding = "utf8", newline='') as fileOut:
    for line in fileIn:             
        for field in fields:
            if field in line:
                line = line.replace(field, fields[field])
        fileOut.write(line)

Note: When using one file, the waste is pushed at the end of the file. So far I have not figured out why. It does not reflect the number of replacements. (The number of replacements is greater than the number of lines of waste.)

Pseudo-mathematical: oriA < modfA + subEnd(oriA)

I'm ready to fix it.

Edit: When I use two files, everything works correctly. Change fileName1 in the second open() for fileName2 . And change mod argument to "w+" .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM