简体   繁体   中英

Python string literals, regex and sed

I am new to Python.

I am importing a series of files into a sqlite3 database using a python script. Some of the raw files have spurious ^M characters that split the records into multiple lines.

The following sed command correctly removes the ^M and joins the two lines, creating a valid record.

sed -i '/^M^M$/ {s/^M//g;N;s/\\n//};' <file>

The ^M in the above are created with the CTRL+V CTRL+M sequence.

The Python lines for the sed call are:

cmd = "sed -i '/\^M\^M$/ {s/\^M//g; N; s/\n////g; };' %s" % (file)
os.system(cmd)

I have attempted a variety of escape sequences (including the triple ''') in Python and get parsing errors, including unterminated address regex , unterminated 's' command and unknown option to 's' , and without escaping the ^M I get a hard-stop parsing error of SyntaxError: EOL while scanning string literal

How can I either

a) Encode the sed call so that it will execute properly when called with os.system(cmd)

or

b) Perform the equivalent substitution in python directly (probably preferable, but I would want to be able to perform multiple types of corrections in one pass, not one pass per correction type).

Thank you.

^M character is Carriage Return (CR) . It's the '\\r' character in python.

So, I guess, this should work fine:

cmd = "sed -i '/\r\r$/ {s/\r//g; N; s/\\n////g; };' %s" % (file)
os.system(cmd)

It would be much easier, particularly since you say you have multiple substitutions to perform, to do this entirely in Python. The carriage return character is "\\r" .

Untested code for the task is as follows:

replacements = (("\r", ""),
                ("one", "two"),
                ("three", "four"))
with open(filename, "r") as fin, open(filename+".new", "w") as fout:
    data = fin.read()
    for t1, t2 in replacements:
        data = data.replace(t1, t2)
    fout.write(data)

It then remains as an exercise for the reader to rename the output file to overwrite the input file. note, by the way, that this code is explicitly designed to work with text files. In Python 3 that would make a difference.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM