在bash中将csv的单列水平拆分为bash中的多个较小的csv文件

Question

I am using bash and I have a single column (not row) in csv with no headers- samplefile.csv 我正在使用bash，并且在csv中只有一列 （不是行） ，没有标题-samplefile.csv

I am looking to split this into (say)2 csv files of 4 rows and a single column each in this case (if odd number say 9 rows then 5 and 4) csv files with data 在这种情况下，我想将其拆分为（说）2个4行和单个列的 csv文件（如果奇数说9行，那么5和4）带有数据的csv文件

output1.csv (1 column 4 rows) output1.csv（1列4行）

and output2.csv (1 column and 4 rows) 和output2.csv（1列4行）

Csplit does not create csv files as shown here split a file into x files where file names are numbered Csplit不会创建csv文件，如下所示：将文件拆分为x个文件，文件名已编号

Any suggestions? 有什么建议么？

Answer 1

This is simple with awk try awk '{print $0 > ("output"i+1".csv")}!(NR%4){i++}' file . 使用awk尝试使用awk '{print $0 > ("output"i+1".csv")}!(NR%4){i++}' file 。

Demo: 演示：

$ ls 
file

$ cat file 
111 
222 
333 
444 
555 
666 
777 
888

$ awk '{print $0 > ("output"i+1".csv")}!(NR%4){i++}' file

$ ls
file  output1.csv  output2.csv

$ cat output1.csv 
111 
222 
333 
444 

$ cat output2.csv 
555 
666 
777 
888

Explanation: 说明：

The modulus operator is key here, we want to split the input line after every fourth line: 模数运算符是此处的关键，我们要在每四行之后分割输入行：

$ awk '{print NR%4,$0}' file
1 111
2 222
3 333
0 444
1 555
2 666
3 777
0 888

The modulus (remainder) of four at every fourth is of course zero so we use this fact to increment the file counter. 当然，模数（余数）为四分之四为零，因此我们利用这一事实来增加文件计数器。 !(NR%4) is shorthand for NR%4==0 as zero evaluates as false and NR%4 is zero when we want the block {i++} to execute so we negative it. !(NR%4)是NR%4==0简写，因为当我们希望执行块{i++}时，零评估为false，而NR%4为零，因此我们将其否定。

$ awk '{print NR%4,$0,"output"i+1".csv"}!(NR%4){i++}' file
1 111 output1.csv
2 222 output1.csv
3 333 output1.csv
0 444 output1.csv
1 555 output2.csv
2 666 output2.csv
3 777 output2.csv
0 888 output2.csv

Answer 2

what are you looking for is just split command, with -n option 您在寻找什么只是带有-n选项的split命令

split -nl/2 input output

will do the job for you. 将为您完成这项工作。

from split man page: 从拆分手册页：

-n, --number=CHUNKS
              generate CHUNKS output files.  See below
 CHUNKS may be: N       split into N files based on size of input K/N     output Kth of N to stdout l/N     split into N  files  without  splitting  lines  l/K/N
       output Kth of N to stdout without splitting lines r/N     like 'l' but use round robin distribution r/K/N   likewise but only output Kth of N to stdout

Answer 3

This worked for me. 这对我有用。 I opened the resulting csv in excel and it was formatted correctly. 我在excel中打开了生成的csv，并且其格式正确。 I haven't yet figured out how to remove the trailing comma, but that seems acceptable based on many csv format definition. 我还没有弄清楚如何删除结尾的逗号，但是根据许多csv格式定义，这似乎可以接受。 The first xargs call adds a comma to each line of the file. 第一个xargs调用将逗号添加到文件的每一行。 The second xargs batches four together. 第二个xargs一起批处理四个。 If you redirect that to a file (> new.csv) it may be what you are looking for. 如果将其重定向到文件（> new.csv），则可能是您要查找的文件。

>cat my.csv
111
222
333
444
555
666
777
888 
>cat my.csv | xargs -n 1 -i echo \{\}, | xargs -n 4 
111, 222, 333, 444,
555, 666, 777, 888,

Answer 4

I you can use split command. 我可以使用split命令。

n= awk 'END{print int(NR/2)}' file_name && split -l $n file_name n = awk 'END{print int(NR/2)}' file_name && split -l $ n file_name

cat output1
111
222
333
444


cat output2 
555
666
777
888

在bash中将csv的单列水平拆分为bash中的多个较小的csv文件

问题描述

4 个解决方案

解决方案1
3 已采纳 2013-04-10 09:23:33

解决方案2
1 2013-04-10 09:01:08

解决方案3
0 2013-04-10 01:43:12

解决方案4
0 2013-04-10 07:51:49

在bash中将csv的单列水平拆分为bash中的多个较小的csv文件

问题描述

4 个解决方案

解决方案1 3 已采纳 2013-04-10 09:23:33

解决方案2 1 2013-04-10 09:01:08

解决方案3 0 2013-04-10 01:43:12

解决方案4 0 2013-04-10 07:51:49

解决方案1
3 已采纳 2013-04-10 09:23:33

解决方案2
1 2013-04-10 09:01:08

解决方案3
0 2013-04-10 01:43:12

解决方案4
0 2013-04-10 07:51:49