简体   繁体   中英

sed command run using os.system() or subprocess.call() leaves csv file without a delimiter

I am running a Python script which takes the dump of CSVs from a Postgres database and then I want to escape double quotes in all these files. So I am using sed to do so.
In my Python code:

sed_for_quotes = 'sed -i s/\\"//g /home/ubuntu/PTOR/csvdata1/'+table+'.csv'  
subprocess.call(sed_for_quotes, shell=True)  

The process completes without any error, but when I load these tables to Redshift, I get error No delimiter found and upon checking the CSV, I find that one of the rows is only half-loaded,for example if it is a timestamp column, then only half of it is loaded, and there is no data after that in the table (while the actual CSV has that data before running sed ). And that leads to the No delimiter found error.

But when I run sed -is/\\"//g filename.csv on these files in the shell it works fine and the csv after running sed has all the rows. I have checked that there is no problem with the data in the files.

What is the reason for this not working in a Python program ? I have also tried using sed -i.bak in the python program but that makes no difference.

Please Note that I am using an extra backslash(\\) in the Python code because I need to escape the other backslash.
Other approaches tried :

  • Using subprocess.Popen without any buffer size and with positive buffer size, but that didn't help
  • Using subprocess.Popen(sed_for_quotes,bufsize=-4096) (negative buffer size) worked for one of the files which was giving the error, but then encountered the same problem in another file.

Do not use intermediate shell when you do not need to. And check for return code of the subprocess to make sure it completed successfully ( check_call does this for you)

path_to_file = ... # e.g. '/home/ubuntu/PTOR/csvdata1/' + table + '.csv'
subprocess.check_call(['sed', '-i', 's/"//g', path_to_file])

By "intermediate" shell I mean the shell process run by subprocess that parses the command (± splits by whitespace but not only) and runs it (runs sed in this example). Since you precisely know what arguments sed should be invoked with, you do not need all this and it's best to avoid that.

Put your sed into a shell script , eg

#!/bin/bash
# Parameter $1 = Filename
sed -i 's/"//g' "$1"

Call your shell script using subprocess :

sed_for_quotes = 'my_sed_script /home/ubuntu/PTOR/csvdata1/'+table+'.csv'  

Use docs.python.org/3.6: shlex.split
shlex.split(s, comments=False, posix=True)
Split the string s using shell-like syntax.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM