简体   繁体   English

如何使用 bash 计算文件中的行数和列数?

[英]How do I count the number of rows and columns in a file using bash?

Say I have a large file with many rows and many columns.假设我有一个包含多行多列的大文件。 I'd like to find out how many rows and columns I have using bash.我想知道我使用 bash 有多少行和列。

Columns: awk '{print NF}' file | sort -nu | tail -n 1列: awk '{print NF}' file | sort -nu | tail -n 1 awk '{print NF}' file | sort -nu | tail -n 1

Use head -n 1 for lowest column count, tail -n 1 for highest column count.使用head -n 1表示最低列数, tail -n 1表示最高列数。

Rows: cat file | wc -l行: cat file | wc -l cat file | wc -l or wc -l < file for the UUOC crowd. cat file | wc -lwc -l < file用于UUOC人群的wc -l < file

Alternatively to count columns, count the separators between columns.或者计算列数,计算列之间的分隔符。 I find this to be a good balance of brevity and ease to remember.我发现这是简洁和易于记忆的良好平衡。 Of course, this won't work if your data include the column separator.当然,如果您的数据包含列分隔符,这将不起作用。

head -n1 myfile.txt | grep -o " " | wc -l

Uses head -n1 to grab the first line of the file.使用head -n1抓取文件的第一行。 Uses grep -o to to count all the spaces, and output each space found on a new line.使用grep -o来计算所有空格,并输出在新行上找到的每个空格。 Uses wc -l to count the number of lines.使用wc -l计算行数。

EDIT: As Gaurav Tuli points out below, I forgot to mention you have to mentally add 1 to the result, or otherwise script this math.编辑:正如 Gaurav Tuli 在下面指出的那样,我忘了提到你必须在心理上给结果加 1,或者以其他方式编写这个数学。

If your file is big but you are certain that the number of columns remains the same for each row (and you have no heading) use:如果您的文件很大但您确定每行的列数保持不变(并且您没有标题),请使用:

head -n 1 FILE | awk '{print NF}'

to find the number of columns, where FILE is your file name.查找列数,其中 FILE 是您的文件名。

To find the number of lines 'wc -l FILE' will work.要查找行数,'wc -l FILE' 将起作用。

Little twist to kirill_igum's answer, and you can easily count the number of columns of any certain row you want, which was why I've come to this question, even though the question is asking for the whole file. kirill_igum 的答案稍有改动,您可以轻松计算您想要的任何特定行的列数,这就是我提出这个问题的原因,即使该问题要求的是整个文件。 (Though if your file has same columns in each line this also still works of course): (虽然如果您的文件在每一行中都有相同的列,这当然也仍然有效):

head -2 file |tail -1 |tr '\t' '\n' |wc -l

Gives the number of columns of row 2. Replace 2 with 55 for example to get it for row 55.给出第 2 行的列数。例如用 55 替换 2 以获得第 55 行的列数。

-bash-4.2$ cat file
1       2       3
1       2       3       4
1       2
1       2       3       4       5

-bash-4.2$ head -1 file |tail -1 |tr '\t' '\n' |wc -l
3
-bash-4.2$ head -4 file |tail -1 |tr '\t' '\n' |wc -l
5

Code above works if your file is separated by tabs, as we define it to "tr".如果您的文件由制表符分隔,则上面的代码有效,因为我们将其定义为“tr”。 If your file has another separator, say commas, you can still count your "columns" using the same trick by simply changing the separator character "t" to ",":如果您的文件有另一个分隔符,比如逗号,您仍然可以使用相同的技巧通过简单地将分隔符“t”更改为“,”来计算“列”:

-bash-4.2$ cat csvfile
1,2,3,4
1,2
1,2,3,4,5
-bash-4.2$ head -2 csvfile |tail -1 |tr '\,' '\n' |wc -l
2

If counting number of columns in the first is enough, try the following:如果计算第一列的数量就足够了,请尝试以下操作:

awk -F'\\t' '{print NF; exit}' myBigFile.tsv

where \\t is column delimiter.其中\\t是列分隔符。

You can use bash.您可以使用 bash。 Note for very large files in terms of GB, use awk/wc .请注意,对于以 GB 为单位的非常大的文件,请使用awk/wc However it should still be manageable in performance for files with a few MB.但是,对于几 MB 的文件,它的性能应该仍然可以管理。

declare -i count=0
while read
do
    ((count++))
done < file    
echo "line count: $count"
head -1 file.tsv |head -1 train.tsv |tr '\t' '\n' |wc -l

取第一行,更改制表符(或者您可以使用 ',' 而不是 '\\t' 作为逗号),计算行数。

awk 'BEGIN{FS=","}END{print "COLUMN NO: "NF " ROWS NO: "NR}' file

您可以使用任何分隔符作为字段分隔符,并可以找到行数和列数

Simple row count is $(wc -l "$file") .简单的行数是$(wc -l "$file") Use $(wc -lL "$file") to show both the number of lines and the number of characters in the longest line.使用$(wc -lL "$file")显示行数和最长行中的字符数。

For rows you can simply use wc -l file对于行,您可以简单地使用wc -l file

-l stands for total line -l代表总行

for columns uou can simply use head -1 file | tr ";" "\\n" | wc -l对于列,您可以简单地使用head -1 file | tr ";" "\\n" | wc -l head -1 file | tr ";" "\\n" | wc -l

Explanation解释
head -1 file
Grabbing the first line of your file, which should be the headers, and sending to it to the next cmd through the pipe获取文件的第一行(应该是标题),然后通过管道将其发送到下一个 cmd
| tr ";" "\\n"

tr stands for translate. tr代表翻译。
It will translate all ;它将全部翻译; characters into a newline character.字符转换为换行符。
In this example ;在这个例子中; is your delimiter.是你的分隔符。

Then it sends data to next command.然后它将数据发送到下一个命令。

wc -l
Counts the total number of lines.计算总行数。

Perl solution: Perl解决方案:

perl -ane '$maxc = $#F if $#F > $maxc; END{$maxc++; print "max columns: $maxc\\nrows: $.\\n"}' file

If your input file is comma-separated:如果您的输入文件以逗号分隔:

perl -F, -ane '$maxc = $#F if $#F > $maxc; END{$maxc++; print "max columns: $maxc\\nrows: $.\\n"}' file

output:输出:

max columns: 5
rows: 2

-a autosplits input line to @F array -a将输入行自动拆分为@F数组
$#F is the number of columns -1 $#F是列数 -1
-F, field separator of , instead of whitespace -F,字段分隔符 , 而不是空格
$. is the line number (number of rows)是行号(行数)

A very simple way to count the columns of the first line in pure bash (no awk, perl, or other languages):在纯 bash(无 awk、perl 或其他语言)中计算第一行列的一种非常简单的方法:

read -r line < $input_file
ncols=`echo $line | wc -w`

This will work if your data are formatted appropriately.如果您的数据格式正确,这将起作用。

Following code will do the job and will allow you to specify field delimiter.以下代码将完成这项工作,并允许您指定字段分隔符。 This is especially useful for files containing more than 20k lines.这对于包含超过 20k 行的文件特别有用。

awk 'BEGIN { 
  FS="|"; 
  min=10000; 
}
{ 
  if( NF > max ) max = NF; 
  if( NF < min ) min = NF;
} 
END { 
  print "Max=" max; 
  print "Min=" min; 
} ' myPipeDelimitedFile.dat

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM