简体   繁体   English

perl用“,”作为模式拆分8GB csv

[英]perl split 8gb csv with “,” as pattern

I recognise this might be a duplicate but the size of the file I have to split requires a method with doesn't load the csv into memory before processing it. 我知道这可能是重复的,但我必须拆分的文件大小需要使用的方法在处理之前不会将csv加载到内存中。 ie I'm looking for a line by line method to read and split and output my file. 即我正在寻找一种逐行方法来读取,拆分和输出我的文件。 II only need my output to be the last 3 field without the quotes and without the thousand delimiting comma. II仅需要将我的输出作为最后3个字段(不带引号和千位定界逗号)。

I have a file of arcGIS coordinates which contain quotes and commas internal to the fields. 我有一个arcGIS坐标文件,其中包含字段内部的引号和逗号。 Data example below. 下面的数据示例。

"0","0","1","1","1,058.83","1,455,503.936","5,173,996.331" “0”, “0”, “1”, “1”, “1,058.83”, “1,455,503.936”, “5,173,996.331”

I have been trying to do this using variations on split( '","' , $line);. 我一直在尝试使用split('“,”',$ line);的变体来做到这一点。 Here'e my code. 这是我的代码。

use strict;
use warnings;

open (FH, '<', "DEM_Export.csv") or die "Can't open file DEM_Export.csv";

open (FH2, '>', "DEM_ExportProcessed.csv") or die "Can't open file DEM_ExportProcessed.csv"; 
print FH2 "EASTING, NORTHING, ELEVATION,\n";
my $count = 0;
foreach my $line (<FH>) {
    chomp;
    # if ($count == 0){next;}

    print $line, "\n";
    my @list = split( '","' , $line);
    print "1st print $list[5],$list[6],$list[4]\n";
    $list[4] =~ s/,//g;
    $list[5] =~ s/,//g;
    $list[6] =~ s/,//g;
    $list[4] =~ s/"//g;
    $list[5] =~ s/"//g;
    $list[6] =~ s/"//g;
    print "2nd print $list[5],$list[6],$list[4]\n";
    if ($count == 10) { 
        exit;
    }      
    my $string = sprintf("%.3f,%.3f,%.3f\n", $list[5],$list[6],$list[4]); 
    print FH2 $string;
    $count++;
}

close FH;
close FH2;

I'm getting close my my wits end with this and really need a solution. 我的智慧到此为止,真的需要一个解决方案。 Any help will be gratefully received. 任何帮助将不胜感激。 Cheers 干杯

This is really very straightforward using the Text::CSV to handle the nastiness of CSV data 使用Text::CSV来处理CSV数据的复杂性确实非常简单

Here's an example, which works fine with the sample data you have shown. 这是一个示例,可以很好地处理您显示的示例数据。 As long as your input file is plain ASCII and the rows are about the size you have shown it should work fine 只要您的输入文件是纯ASCII且行的大小与您显示的大小相同,它就可以正常工作

It prints its output to STDOUT, so you'll want to use a command-line redirect to put it into the file you want 它将输出打印到STDOUT,因此您将需要使用命令行重定向将其放入所需的文件中

use strict;
use warnings 'all';

use Text::CSV;

my $csv_file = 'DEM_Export.csv';

open my $in_fh, '<', $csv_file or die qq{Unable to open "$csv_file" for input: $!};

my $csv = Text::CSV->new({ eol => "\n" });

print "EASTING,NORTHING,ELEVATION\n";

while ( my $row = $csv->getline($in_fh) ) {

   $csv->print(\*STDOUT, [ map tr/,//dr, @$row[-2,-1,-3] ] );
}

output 产量

1455503.936,5173996.331,1058.83

I guess I should have been braver and had a crack with Text::CSV to start with rather than asking a question. 我想我应该很勇敢,并且从Text :: CSV入手,而不是问一个问题。 Many thanks to Сухой27 and choroba for pointing me in the right direction. 非常感谢Сухой27和choroba为我指明了正确的方向。

Here is the code I ended up with. 这是我最终得到的代码。 Probably not the tidiest. 可能不是最整洁。

use strict;
use warnings;
use Text::CSV;

my $file  = "DEM_Export.csv";
my $file2 = "DEM_ExportProcessed.csv";

open (FH2, '>', $file2) or die "Can't open file $file2: $!";
print FH2 "EASTING, NORTHING, ELEVATION,\n";
print "Starting file processing...\n";
my $csv = Text::CSV->new ({ binary => 1, eol => $/ });
open my $io, "<", $file or die "$file: $!";
while (my $row = $csv->getline ($io)) {
    my @fields = @$row;
    s/,//g for @fields[3..5];     
    my $string = sprintf("%.3f,%.3f,%.3f\n", $fields[4],$fields[5],$fields[3]); 
    print FH2 $string;
}
print "Finished!";
close FH2;  

Worked a treat! 工作了请客! Thank you. 谢谢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM