简体   繁体   English

如何使用Perl计算大型CSV文件中的行数?

[英]How do I count the number of rows in a large CSV file with Perl?

I have to use Perl on a Windows environment at work, and I need to be able to find out the number of rows that a large csv file contains (about 1.4Gb). 我必须在工作的Windows环境中使用Perl,并且我需要能够找出大型csv文件包含的行数(大约1.4Gb)。 Any idea how to do this with minimum waste of resources? 知道如何以最少的资源浪费做到这一点吗?

Thanks 谢谢

PS This must be done within the Perl script and we're not allowed to install any new modules onto the system. PS这必须在Perl脚本中完成,我们不允许在系统上安装任何新模块。

Do you mean lines or rows? 你的意思是行或行? A cell may contain line breaks which would add lines to the file, but not rows. 单元格可能包含换行符,这会将行添加到文件中,但不会添加行。 If you are guaranteed that no cells contain new lines, then just use the technique in the Perl FAQ . 如果您确保没有单元格包含新行,那么只需使用Perl FAQ中的技术即可。 Otherwise, you will need a proper CSV parser like Text::xSV . 否则,您将需要一个适当的CSV解析器,如Text :: xSV

Yes, don't use perl. 是的,不要使用perl。

Instead use the simple utility for counting lines; 而是使用简单的实用程序来计算行数; wc.exe wc.exe

It's part of a suite of windows utilities ported from unix originals. 它是从unix原件移植的一套Windows实用程序的一部分。

http://unxutils.sourceforge.net/ http://unxutils.sourceforge.net/

For example; 例如;

PS D:\> wc test.pl
     12      26     271 test.pl
PS D:\>

Where 12 == number of lines, 26 == number of words, 271 == number of characters. 其中12 ==行数,26 ==单词数,271 ==字符数。

If you really have to use perl; 如果你真的必须使用perl;

D:\>perl -lne "END{print $.;}" < test.pl
12
perl -lne "END { print $. }" myfile.csv

这一次只读取一行,因此除非每行非常长,否则不会浪费任何内存。

This one-liner handles new lines within the rows: 这个单行处理行中的新行:

  1. Considering lines with an odd number of quotes. 考虑具有奇数引号的行。
  2. Considering that doubled quotes is a way of indicating quotes within the field. 考虑到双引号是一种在字段内指示引号的方式。
  3. It uses the awesome flip-flop operator. 它使用了令人敬畏的触发器操作器。

     perl -ne 'BEGIN{$re=qr/^[^"]*(?:"[^"]*"[^"]*)*?"[^"]*$/;}END{print"Count: $t\\n";}$t++ unless /$re/../$re/' 

Consider: 考虑:

  • wc is not going to work. wc不会起作用。 It's awesome for counting lines, but not CSV rows 计算行数很棒,但不是CSV行
  • You should install--or fight to install-- Text::CSV or some similar standard package for proper handling. 应该安装 - 或争取安装 - Text::CSV或一些类似的标准包以便正确处理。
  • This may get you there, nonetheless. 尽管如此,这可能会让你到达那里。


EDIT: It slipped my mind that this was windows: 编辑:我觉得这是窗户:

 perl -ne "BEGIN{$re=qr/^[^\\"]*(?:\\"[^\\"]*\\"[^\\"]*)*?\\"[^\\"]*$/;}END{print qq/Count: $t\\n/;};$t++ unless $pq and $pq = /$re/../$re/;" 

The weird thing is that The Broken OS' shell interprets && as the OS conditional exec and I couldn't do anything to change its mind!! 奇怪的是,破碎的OS'shell将&&解释为操作系统条件执行者,我无法改变主意! If I escaped it, it would just pass it that way to perl. 如果我逃脱它,它只会通过这种方式传递给perl。

upvote为edg的答案,另一个选择是安装cygwin来获取wc和Windows上的一些其他方便的实用程序。

I was being idiotic, the simple way to do it in the script is: 我是愚蠢的,在脚本中这样做的简单方法是:

open $extract, "<${extractFileName}" or die ("Cannot read row count of $extractFileName");
$rowCount=0;    
while (<$extract>)
{
    $rowCount=$rowCount+1;
}

close($extract);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM