I have to process multiple CSV files as below
as you guys can see this CSV file includes 2 rows header with different header length. I just want to read all the content belong to the 1st header("1_LENGTH,2_LENGTH,START_POS,END_POS,RESULT_HIS,START_FRAME,START_SUPER_FRAME")
only. And write it to the only 1 CSV file. So is there any way to do it. Thank you guys so much
My code till now as below:
def total():
for in_path in Path(log_path).glob('*.csv'):
out_path =Path(os.getcwd()+"/").with_name("TOTAL.csv")
with in_path.open('r') as fin, out_path.open('w') as fout:
header = ['CT_LENGTH', 'T_LENGTH', 'START_POS', 'END_POS','RESULT_HIS','START_FRAME', 'START_SUPER_FRAME']
reader = pd.read_csv(fin,index_col=False,delim_whitespace=False)
writer = reader[header].to_csv()
# raw = '\n'.join(writer)
print (writer)
# fout.writable()
return
The bash script below takes whatever arguments follow the command and treats them all as CSV files. It also assumes that they all have the same header rows and creates one massive CSV in the same directory as the first file.
#!/bin/bash
# combine selected items into one master csv
if [ "$1" == "" ]; then
echo "Combine multiple CSV files into one preserving the header of the first file"
echo "Output file is created in the same directory as the first file"
echo " "
echo "Use: "
echo "$0 file1.csv file2.csv file3.csv pattern*.csv"
exit 0
fi
FIRST=1
OUTPATH=$(dirname "$1")
TIME=$(date +%Y%m%d_%H%M)
OUTPUT=$OUTPATH/all_combined-$TIME.csv
for var in "$@"
do
if [ $FIRST -gt 0 ]; then
FIRST=0
cat "$var" > "$OUTPUT"
else
tail -n+2 "$var" >> "$OUTPUT"
fi
echo "added $var to $OUTPUT"
done
Use: $ combine.sh one.csv two.csv HOME*.csv
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.