简体   繁体   English

检查文件中所有行的格式是否相同

[英]Check if all lines in a file are in the same format

I would like to wrote a little shell script that permit to check if all lines on a file has the same number of ;我想写一个小 shell 脚本,它允许检查文件中的所有行是否具有相同数量的;

I have a file containing the following format :我有一个包含以下格式的文件:

$ cat filename.txt

34567890;098765456789;098765567;9876;9876;EXTG;687J;
4567800987987;09876789;9667876YH;9876;098765;098765;09876;
SLKL987H;09876LKJ;POIUYT;PÖIUYT;88765K;POIUYTY;LKJHGFDF;
TYUIO;09876LKJ;POIUYT;LKJHG;88765K;POIUYTY;OIUYT;
...
...
...
SDFGHJK;RTYUIO9876;4567890LKJHGFD;POIUYTRF56789;POIUY;POIUYT;9876;

I use the following command for determine of the number of ;我使用以下命令来确定数量; of each line :每行:

awk -F';' 'NF{print (NF-1)}' filename.txt

I have the following output :我有以下输出:

7
7
7
7
...
...
...
7

Because number of ;因为数量; on each line of this file is 7.这个文件的每一行都是 7。

Now, I want to wrote a script that permit me to verify if all the lines in the file have 7 commas.现在,我想编写一个脚本来验证文件中的所有行是否都有 7 个逗号。 If it's OK, it tells me that the file is correct.如果没问题,它告诉我文件是正确的。 Otherwise, if there is a single line containing more than 7 commas, it tells me that the file is not correct.否则,如果有一行包含超过 7 个逗号,它会告诉我该文件不正确。

Rather than printing output, return a value.返回一个值而不是打印输出。 eg例如

awk -F',' 'NR==1{count = NF} NF!=count{status=1}END{exit status}' filename.txt

If there are no lines or if all lines contain the same number of commas, this will return 0. Otherwise, it returns 1 to indicate failure.如果没有行或所有行都包含相同数量的逗号,则返回 0。否则,返回 1 表示失败。

Count the number of unique lines and verify that the count is 1.计算唯一行的数量并验证计数为 1。

if (($(awk -F';' 'NF{print (NF-1)}' filename.txt | uniq | wc -l) == 1)); then
    echo good
else
    echo bad
fi

Just pipe the result through sort -u | wc -l只需通过sort -u | wc -l管道结果sort -u | wc -l . sort -u | wc -l If all lines have the same number of fields, this will produce one line of output.如果所有行具有相同数量的字段,这将产生一行输出。

Alternatively, just look for a line in awk that doesn't have the same number of fields as the first line.或者,只需在awk中查找字段数与第一行不同的行。

awk -F';' 'NR==1 {linecount=NF}
           linecount != NF { print "Bad line " $0; exit 1}
          ' filename.txt && echo "Good file"

You can also adapt the old trick used to output only the first of duplicate lines.您还可以调整用于仅输出重复行中的第一行的旧技巧。

awk -F';' '{a[NF]=1}; length(a) > 1 {exit 1}' filename.txt

Each line updates the count of lines with that number of fields.每行更新具有该字段数的行数。 Exit with status 1 as soon as a has more than one entry.一旦a有多个条目,就以状态 1 退出。 Basically, a acts as a set of all field counts seen so far.基本上, a是迄今为止看到的所有字段计数的集合。

Based on all the information you have given me, I ended up doing the following.根据你给我的所有信息,我最终做了以下事情。 And it works for me.它对我有用。

nbCol=`awk -F';' '(NR==1){print NF;}' $1`
val=7
awk -F';' 'NR==1{count = NF} NF != count { exit 1}' $1
result=`echo $?`

if [ $result -eq 0  ] && [ $nbCol -eq $val ];then
echo "Good Format"
else
echo "Bad Format"
fi

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM