I am using bash and I have a single column (not row) in csv with no headers- samplefile.csv
111
222
333
444
555
666
777
888
I am looking to split this into (say)2 csv files of 4 rows and a single column each in this case (if odd number say 9 rows then 5 and 4) csv files with data
output1.csv (1 column 4 rows)
111
222
333
444
and output2.csv (1 column and 4 rows)
555
666
777
888
Csplit does not create csv files as shown here split a file into x files where file names are numbered
Any suggestions?
This is simple with awk
try awk '{print $0 > ("output"i+1".csv")}!(NR%4){i++}' file
.
Demo:
$ ls
file
$ cat file
111
222
333
444
555
666
777
888
$ awk '{print $0 > ("output"i+1".csv")}!(NR%4){i++}' file
$ ls
file output1.csv output2.csv
$ cat output1.csv
111
222
333
444
$ cat output2.csv
555
666
777
888
Explanation:
The modulus operator is key here, we want to split the input line after every fourth line:
$ awk '{print NR%4,$0}' file
1 111
2 222
3 333
0 444
1 555
2 666
3 777
0 888
The modulus (remainder) of four at every fourth is of course zero so we use this fact to increment the file counter. !(NR%4)
is shorthand for NR%4==0
as zero evaluates as false and NR%4
is zero when we want the block {i++}
to execute so we negative it.
$ awk '{print NR%4,$0,"output"i+1".csv"}!(NR%4){i++}' file
1 111 output1.csv
2 222 output1.csv
3 333 output1.csv
0 444 output1.csv
1 555 output2.csv
2 666 output2.csv
3 777 output2.csv
0 888 output2.csv
what are you looking for is just split command, with -n
option
split -nl/2 input output
will do the job for you.
from split man page:
-n, --number=CHUNKS
generate CHUNKS output files. See below
CHUNKS may be: N split into N files based on size of input K/N output Kth of N to stdout l/N split into N files without splitting lines l/K/N
output Kth of N to stdout without splitting lines r/N like 'l' but use round robin distribution r/K/N likewise but only output Kth of N to stdout
This worked for me. I opened the resulting csv in excel and it was formatted correctly. I haven't yet figured out how to remove the trailing comma, but that seems acceptable based on many csv format definition. The first xargs call adds a comma to each line of the file. The second xargs batches four together. If you redirect that to a file (> new.csv) it may be what you are looking for.
>cat my.csv
111
222
333
444
555
666
777
888
>cat my.csv | xargs -n 1 -i echo \{\}, | xargs -n 4
111, 222, 333, 444,
555, 666, 777, 888,
I you can use split
command.
n=
awk 'END{print int(NR/2)}' file_name
&& split -l $n file_name
cat output1
111
222
333
444
cat output2
555
666
777
888
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.