[英]Check if all lines in a file are in the same format
I would like to wrote a little shell script that permit to check if all lines on a file has the same number of ;
我想写一个小 shell 脚本,它允许检查文件中的所有行是否具有相同数量的
;
I have a file containing the following format :我有一个包含以下格式的文件:
$ cat filename.txt
34567890;098765456789;098765567;9876;9876;EXTG;687J;
4567800987987;09876789;9667876YH;9876;098765;098765;09876;
SLKL987H;09876LKJ;POIUYT;PÖIUYT;88765K;POIUYTY;LKJHGFDF;
TYUIO;09876LKJ;POIUYT;LKJHG;88765K;POIUYTY;OIUYT;
...
...
...
SDFGHJK;RTYUIO9876;4567890LKJHGFD;POIUYTRF56789;POIUY;POIUYT;9876;
I use the following command for determine of the number of ;
我使用以下命令来确定数量
;
of each line :每行:
awk -F';' 'NF{print (NF-1)}' filename.txt
I have the following output :我有以下输出:
7
7
7
7
...
...
...
7
Because number of ;
因为数量
;
on each line of this file is 7.这个文件的每一行都是 7。
Now, I want to wrote a script that permit me to verify if all the lines in the file have 7 commas.现在,我想编写一个脚本来验证文件中的所有行是否都有 7 个逗号。 If it's OK, it tells me that the file is correct.
如果没问题,它告诉我文件是正确的。 Otherwise, if there is a single line containing more than 7 commas, it tells me that the file is not correct.
否则,如果有一行包含超过 7 个逗号,它会告诉我该文件不正确。
Rather than printing output, return a value.返回一个值而不是打印输出。 eg
例如
awk -F',' 'NR==1{count = NF} NF!=count{status=1}END{exit status}' filename.txt
If there are no lines or if all lines contain the same number of commas, this will return 0. Otherwise, it returns 1 to indicate failure.如果没有行或所有行都包含相同数量的逗号,则返回 0。否则,返回 1 表示失败。
Count the number of unique lines and verify that the count is 1.计算唯一行的数量并验证计数为 1。
if (($(awk -F';' 'NF{print (NF-1)}' filename.txt | uniq | wc -l) == 1)); then
echo good
else
echo bad
fi
Just pipe the result through sort -u | wc -l
只需通过
sort -u | wc -l
管道结果sort -u | wc -l
. sort -u | wc -l
。 If all lines have the same number of fields, this will produce one line of output.如果所有行具有相同数量的字段,这将产生一行输出。
Alternatively, just look for a line in awk
that doesn't have the same number of fields as the first line.或者,只需在
awk
中查找字段数与第一行不同的行。
awk -F';' 'NR==1 {linecount=NF}
linecount != NF { print "Bad line " $0; exit 1}
' filename.txt && echo "Good file"
You can also adapt the old trick used to output only the first of duplicate lines.您还可以调整用于仅输出重复行中的第一行的旧技巧。
awk -F';' '{a[NF]=1}; length(a) > 1 {exit 1}' filename.txt
Each line updates the count of lines with that number of fields.每行更新具有该字段数的行数。 Exit with status 1 as soon as
a
has more than one entry.一旦
a
有多个条目,就以状态 1 退出。 Basically, a
acts as a set of all field counts seen so far.基本上,
a
是迄今为止看到的所有字段计数的集合。
Based on all the information you have given me, I ended up doing the following.根据你给我的所有信息,我最终做了以下事情。 And it works for me.
它对我有用。
nbCol=`awk -F';' '(NR==1){print NF;}' $1`
val=7
awk -F';' 'NR==1{count = NF} NF != count { exit 1}' $1
result=`echo $?`
if [ $result -eq 0 ] && [ $nbCol -eq $val ];then
echo "Good Format"
else
echo "Bad Format"
fi
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.