Creating multiple csv files from data within a csv file

Question

System OSX or Linux

I'm trying to automate my work flow at work, each week I receive an excel file, which I convert to a csv.

An example is:

,,L1,,,L2,,,L3,,,L4,,,L5,,,L6,,,L7,,,L8,,,L9,,,L10,,,L11,
Title,r/t,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,neede d,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst
EXAMPLEfoo,60,6,6,6,0,0,0,0,0,0,6,6,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
EXAMPLEbar,30,6,6,12,6,7,14,6,6,12,6,6,12,6,8,16,6,7,14,6,7.5,15,6,6,12,6,8,16,6,0,0,6,7,14
EXAMPLE1,60,3,3,3,3,5,5,3,4,4,3,3,3,3,6,6,3,4,4,3,3,3,3,4,4,3,8,8,3,0,0,3,4,4
EXAMPLE2,120,6,6,3,0,0,0,6,8,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
EXAMPLE3,60,6,6,6,6,8,8,6,6,6,6,6,6,0,0,0,0,0,0,6,8,8,6,6,6,0,0,0,0,0,0,0,10,10
EXAMPLE4,30,6,6,12,6,7,14,6,6,12,6,6,12,3,5.5,11,6,7.5,15,6,6,12,6,0,0,6,9,18,6,0,0,6,6.5,13

And so you can get a picture of how it looks in excel:

What I need to do, is create multiple csv files for each instance in row 1, so L1, L2, L3, L4...

And within that each csv file it needs to contain the title, r/t, needed

So for L1 an example out put would look like:

EXAMPLEfoo,60,6
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,6
EXAMPLE3,60,6
EXAMPLE4,30,6

And for L2:

EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,6
EXAMPLE4,30,6

And so on.

I have tried playing around with sed and awk and hit google but I have found nothing that really solves the issue.

I'd imagine perl would be particular suited to this or maybe python, so I would be more than happy to accept suggestions from users.

So, any suggestions?

Thanks in advance.

Answer 1

Perl "one-liner"

perl -MText::CSV_XS -e'$c=Text::CSV_XS->new({binary=>1,eol=>"\n"});%a=map{$i++;/^L\d+$/?($_=>$i):()}@{$c->getline(*ARGV)};open$b{$_},">$_"for keys%a;while($f=$c->getline(*ARGV)){$c->print($b{$_},[@$f[0,1,$a{$_}]])for keys%a}'

For ones which have problem with reading:

$ echo '$c=Te...' | perltidy
$c = Text::CSV_XS->new( { binary => 1, eol => "\n" } );
%a = map { $i++; /^L\d+$/ ? ( $_ => $i ) : () } @{ $c->getline(*ARGV) };
open $b{$_}, ">$_" for keys %a;
while ( $f = $c->getline(*ARGV) ) {
    $c->print( $b{$_}, [ @$f[ 0, 1, $a{$_} ] ] )
      for keys %a;
}

Answer 2

Using only AWK:

awk -F, -vOFS=, -vc=1 '
    NR == 1 {
        for (i=1; i<NF; i++) {
            if ($i != "") {
                g[c]=i;
                f[c++]=$i
            }
        }
    }
    NR>2 {
        for (i=1; i < c; i++) {
            print $1,$2, $g[i] > "output_"f[i]".csv"
        }
    }' data.csv

As a one-liner:

awk -F, -vOFS=, -vc=1 'NR == 1 {for (i=1; i<NF; i++) {if ($i != "") {g[c]=i; f[c++]=$i}}} NR>2 { for (i=1; i < c; i++) {print $1,$2, $g[i] > "file_"f[i]".csv" }}' data.csv

Example output:

$ cat file_L1.csv
EXAMPLEfoo,60,6
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,6
EXAMPLE3,60,6
EXAMPLE4,30,6
$ cat file_L2.csv
EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,6
EXAMPLE4,30,6
$ cat file_L11.csv
EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,0
EXAMPLE4,30,6

Answer 3

use strict;
use warnings;

use Text::CSV;
my $csv = Text::CSV->new;

sub parse_line {
    $csv->parse(shift) or die $!;
    return $csv->fields;
}

my @metadata;
my @files  = parse_line(scalar <>);
my @header = parse_line(scalar <>); # Ignore.
for my $i (0 .. $#files){
    next unless length $files[$i];
    open(my $h, '>', "$files[$i].csv") or die $!;
    push @metadata, {column => $i, handle => $h};
}

while (my $line = <>){
    my @fields = parse_line($line);
    for my $m (@metadata){
        $csv->print($m->{handle}, [ @fields[0, 1, $m->{column}] ]);
        print {$m->{handle}} "\n";
    }
}

Answer 4

try this

#!/bin/bash
awk 'BEGIN{ OFS=FS="," }
NR==1{
 for(i=1;i<=NF;i++){
   if($i){ f[i]=$i }
 }
}
NR>2{ for(o in f){ print $1,$2, $o > "file_"f[o]".csv" } } ' file

output

$ cat file_L1.csv
EXAMPLEfoo,60,6
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,6
EXAMPLE3,60,6
EXAMPLE4,30,6

$ cat file_L2.csv
EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,6
EXAMPLE4,30,6

Answer 5

In Python, slightly hacky and untested, but should do the job:

import csv
r = csv.reader(open(r'file.csv'), dialect='excel')
topline = r.next()
headerline = r.next()

lastcell = ''
for i, cell in enumerate(topline): #Copy cells forwards in the top line, so L1 for example goes across all cells
    if cell == '':
        topline[i] = lastcell
    else:
        lastcell = cell

for i in range(len(headerline)): #Copy the topline cells into the header line, so the headerline cells should be unique
    headerline[i] = '-'.join((topline[i], headerline[i]))

rows = [dict(zip(headerline, line)) for line in r]

# Rows should now consist of dicts of the form {'Title': 'EXAMPLEfoo', 'r/t': '60', 'L1-needed': '6' ...}

for lval in frozenset(topline): #Use frozenset to ensure we only have unique values.
    if lval != '': #Make sure we don't look at the blank value
        w = csv.writer(open(r'%s.csv' % lval, 'w'), dialect='excel')
        for row in rows:
            line = [row['Title'], row['r/t'], row['-'.join((lval, 'needed'))]]
            w.writerow(line)

Answer 6

Have a look at perl module Text::CSV_XS - comma-separated values manipulation routines. I found this module very helpful while manipulating with CSV files.

Creating multiple csv files from data within a csv file

Question

6 answers

solution1
3 2010-04-12 13:09:27

solution2
2 ACCPTED 2010-04-12 13:10:42

solution3
2 2010-04-12 16:19:05

solution4
1 2010-04-12 11:41:25

solution5
0 2010-04-12 11:43:01

solution6
0 2010-04-12 13:12:36

Creating multiple csv files from data within a csv file

Question

6 answers

solution1 3 2010-04-12 13:09:27

solution2 2 ACCPTED 2010-04-12 13:10:42

solution3 2 2010-04-12 16:19:05

solution4 1 2010-04-12 11:41:25

solution5 0 2010-04-12 11:43:01

solution6 0 2010-04-12 13:12:36

solution1
3 2010-04-12 13:09:27

solution2
2 ACCPTED 2010-04-12 13:10:42

solution3
2 2010-04-12 16:19:05

solution4
1 2010-04-12 11:41:25

solution5
0 2010-04-12 11:43:01

solution6
0 2010-04-12 13:12:36