
[英]In Linux shell scripting if we want remove dupicate line, how can i do that except sort -u command
[英]I want to date sort the 5th col using sort command.But the problem is no consistency in format,what can be the way to do it?
M_ID,M_NAME,DEPT_ID,START_DATE,END_DATE,Salary
M001,Richa,D001,27-Jan-07,27-Feb-07,150000
M002,Nitin,D002,16-Feb-07,16-May-07,40000
M003,AJIT,D003,8-Mar-07,8-Sep-07,70000
M004,SHARVARI,D004,28-Mar-07,28-Mar-08,120000
M005,ADITYA,D002,27-Apr-07,27-Jul-07,40000
M006,Rohan,D004,12-Apr-07,12-Apr-08,130000
M007,Usha,D003,17-Apr-07,17-Oct-07,70000
M008,Anjali,D002,2-Apr-07,2-Jul-07,40000
M009,Yash,D006,11-Apr-07,11-Jul-07,85000
M010,Nalini,D007,15-Apr-07,15-Oct-07,9999
tail -10 Joining_date.txt|awk -F\, '{print $1,$2,$3,$4,$5|("sort -t, -M");$6} '
预期 output
M001,Richa,D001,27-Jan-07,27-Feb-07,150000
M002,Nitin,D002,16-Feb-07,16-May-07,40000
M008,anjali,D002,2-Apr-07,2-Jul-07,40000
M009,Yash,D006,11-Apr-07,11-Jul-07,85000
M005,ADITYA,D002,27-Apr-07,27-Jul-07,40000
M003,AJIT,D003,8-Mar-07,8-Sep-07,80000
M010,Nalini,D007,15-Apr-07,15-Oct-07,9999
M007,Usha,D003,17-Apr-07,17-Oct-07,70000
M004,SHARVARI,D004,28-Mar-07,28-Mar-08,120000
M006,Rohan,D004,12-Apr-07,12-Apr-08,130000
使用任何 awk+sort+cut 应用DSU 成语:
$ awk '
BEGIN { FS=","; OFS="\t" }
{
split($5,d)
mth = (index("JanFebMarAprMayJunJulAugSepOctNovDec",d[2])+2)/3
datec=csprintf("%04d%02d%02d", d[3], mth, d[1])
print (NR>1), date, NR, $0
}
' file |
sort -n -k1,2 -k3,3 |
cut -f4-
M_ID,M_NAME,DEPT_ID,START_DATE,END_DATE,Salary
M001,Richa,D001,27-Jan-07,27-Feb-07,150000
M002,Nitin,D002,16-Feb-07,16-May-07,40000
M003,AJIT,D003,8-Mar-07,8-Sep-07,70000
M004,SHARVARI,D004,28-Mar-07,28-Mar-08,120000
M005,ADITYA,D002,27-Apr-07,27-Jul-07,40000
M006,Rohan,D004,12-Apr-07,12-Apr-08,130000
M007,Usha,D003,17-Apr-07,17-Oct-07,70000
M008,Anjali,D002,2-Apr-07,2-Jul-07,40000
M009,Yash,D006,11-Apr-07,11-Jul-07,85000
M010,Nalini,D007,15-Apr-07,15-Oct-07,9999
以上保证了 header 行即使在第 5 字段中有数字也会首先打印,并且在第 5 字段中具有相同日期的所有行都将按照原始输入顺序打印。 如果您真的不想打印 header 行,只需将第二个{
更改为NR>1 {
即可。
像这样:
tail -n+2 Joining_date.txt | sed -E 's/^(([^,]+,){4})([0-9]-)/\10\3/' | LC_ALL=C sort -t ',' -k 5.8n -k 5.4M -k 5.1n
tail -n+2
-- 从第二行到最后。sed -E 's/^(([^,]+,){4})([0-9]-)/\10\3/
-- 将零添加到第 5 个字段的日期到两位数的日期.sort -t ',' -k 5.8n -k 5.4M -k 5.1n
-t ','
-- 为sort
设置字段分隔符 ( ,
)-k 5.8n
-- 首先按数字年份排序。-k 5.4M
-- 接下来按月排序。-k 5.1n
-- 按数字天数排序。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.