简体   繁体   English

根据列值拆分大文件-Linux

[英]Split large file based on column value - linux

I wanted to split the large file (185 Million records) to more than one files based on one column value.The file is .dat file and the delimiter used inbetween the columns are ^A (\). 我想根据一个列的值将大文件(1.85亿条记录)拆分为多个文件。该文件是.dat文件,在列之间使用的分隔符是^ A(\\ u0001)。

The File content is like this: 文件内容如下:

194^A1^A091502^APR^AKIMBERLY^APO83^A^A^A^A0183^AUSA^A^A^A^A^A^A^A^A
194^A1^A091502^APR^AJOHN^APO83^A^A^A^A0183^AUSA^A^A^A^A^A^A^A^A
194^A^A091502^APR^AASHLEY^APO83^A^A^A^A0183^AUSA^A^A^A^A^A^A^A^A
194^A3^A091502^APR^APETER^APO83^A^A^A^A0183^AUSA^A^A^A^A^A^A^A^A
194^A4^A091502^APR^AJOE^APO83^A^A^A^A0183^AUSA^A^A^A^A^A^A^A^A

now i wanted to split the file based on second column value, if you see the third row the second column value is empty, so all the empty rows should come one file , remaining all should come one file. 现在我想根据第二列的值拆分文件,如果您看到第三行,则第二列的值是空的,因此所有空行应为一个文件,其余所有行应为一个文件。

Please help me on this. 请帮我。 I tried to google, it seems we should use awk for this. 我试图谷歌,似乎我们应该为此使用awk。

Regards, Shankar 问候,香卡

With awk: 使用awk:

awk -F '\x01' '$2 == "" { print > "empty.dat"; next } { print > "normal.dat" }' filename

The file names can be chosen arbitrarily, of course. 当然,文件名可以任意选择。 print > "file" prints the current record to a file named "file" . print > "file"将当前记录打印到名为"file"

Addendum re: comment: Removing the column is a little trickier but certainly feasible. 附录re:注释:删除该列比较棘手,但肯定可行。 I'd use 我会用

awk -F '\x01' 'BEGIN { OFS = FS } { fname = $2 == "" ? "empty.dat" : "normal.dat"; for(i = 2; i < NF; ++i) $i = $(i + 1); --NF; print > fname }' filename

This works as follows: 其工作原理如下:

BEGIN {                                          # output field separator is
  OFS = FS                                       # the same as input field
                                                 # separator, so that the
                                                 # rebuilt lines are formatted
                                                 # just like they came in
}
{
  fname = $2 == "" ? "empty.dat" : "normal.dat"  # choose file name

  for(i = 2; i < NF; ++i) {                      # set all fields after the
    $i = $(i + 1)                                # second back one position
  }

  --NF                                           # let awk know the last field
                                                 # is not needed in the output

  print > fname                                  # then print to file.
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM