extract rows and filenames from multiple csv files

Question

I have multiple csv files with date as filename (20080101.csv to 20111031.csv) in a folder. The csv files have common headers. The csv file looks like this:

20080101.csv  
X ;Y; Z  
1 ; 1 ; 3  
1 ; 2 ; 6  
1 ; 3 ; 24  
2 ; 1 ; 24  
2 ; 2 ; 24  

20080102.csv   
X ;Y; Z  
1 ; 1 ; 0.1  
1 ; 2 ; 2  
1 ; 3 ; 67  
2 ; 1 ; 24  
2 ; 2 ; 24  

20080103.csv  
X ;Y; Z  
1 ; 1 ; 3  
1 ; 3 ; 24  
2 ; 1 ; 24  
2 ; 2 ; 24  

20080104.csv   
X ;Y; Z  
1 ; 1 ; 34  
1 ; 2 ; 23  
1 ; 3 ; 67  
2 ; 1 ; 24  
2 ; 2 ; 24

… and so on. I want to write a script that would read the rows and if in a given row we have X=1 and Y=2, the whole row is copied to a new csv file along with filename giving the following output:

X ;Y ; Z ; filename  
1  ; 2 ; 6 ; 20080101  
1  ; 2 ; 2 ; 20080102  
1  ; 2 ; NA; 20080103  
1  ; 2 ; 23; 20080104

Any idea how this can be done and any suggestions about modules that i should look into or any examples. Thanks for your time and help.

Cheers, Navin

Answer 1

This is a well-formed question, from which the logic should be apparent. For someone to provide finished code would defeat the purpose of the assignment. First, add a "homework" tag to the question, then think about what you want to do: 1) loop over the files (keeping track of each filename as it's opened) 2) read lines from the current file 3) if the selection criteria (x==1 and y==2) is met, then write the line.

To get started, try:

import csv, os

for fn in os.listdir():
    if ".csv" in fn:
        with open(fn, 'r', newline='') as f:
            reader = csv.reader(f, delimiter=";")
            for row in reader:
                ...

Then extend the solution to open the output file and write the selected lines using csv.writer.

Answer 2

You could read in each file at a time. Read it line by line

files = ['20080101.csv', '20080102.csv', '20080103.csv'] #...etc
for f in files:
    file = open(f, 'r')
    for line in file:
        ray = line.split(';')
        if (ray[0].strip() == '1' and ray[1].strip() == '2'):
            fout = open('output.csv', 'a')
            fout.write(ray[0].strip() + ' ; ' + ray[1].strip() + ' ; ' + ray[2].strip() + ' ; ' + f + '\n')
            fout.close()
    file.close()

Tested and works. May need some slight modifications.

Answer 3

This should do the job:

import glob
import os

outfile = open('output.csv', 'w')
outfile.write('X ; Y ; Z ; filename\n')
for filename in glob.glob('*.csv'):
  if filename == 'output.csv': # Skip the file we're writing.
    continue
  with open(filename, 'r') as infile:
    count = 0 
    lineno = 0 
    for line in infile:
      lineno += 1
      if lineno == 1: # Skip the header line.
        continue
      fields = line.split(';')
      x = int(fields[0])
      y = int(fields[1])
      z = float(fields[2])
      if x == 1 and y == 2:
        outfile.write('%d ; %d ; %g ; %s\n' % (x, y, z, filename))
        count += 1
    if count == 0: # Handle the case when no lines were found.
      outfile.write('1 ; 2 ; NA ; %s\n' % filename)
outfile.close()

Note that if you can't control or trust the file format you may want to handle exceptions thrown by the conversions to int/float.

Answer 4

if you know that you have one file for each day, no missing day, then i'd use glob('*.csv') to get list of file names, open one bye one, then read like Tyler is doing

if you konw that there are days where file is missing i'd use datetime to star with datetime.date(2008,1,1) and loop incrementing by one day. then for each of day i compose file name using .strftime() + '.csv', and try process file (if no file, just write a recode with NA)

Answer 5

The following should work:

import csv
with open('output.csv', 'w') as outfile:
    outfile.write('X ; Y ; Z ; filename\n')
    fmt = '1 ; 2 ; %s ; %s\n'
    files = ['20080101.csv', '20080102.csv', '20080103.csv', '20080104.csv']
    for file in files:
        with open(file) as f:
            reader = csv.reader(f, delimiter=';')
            for row in reader:
                if len(row) > 2 and row[0].strip() == '1' and row[1].strip() == '2':
                    outfile.write(fmt % (row[2].strip(), file[:-4]))
                    break
            else:
                outfile.write(fmt % ('NA', file[:-4]))

extract rows and filenames from multiple csv files

Question

5 answers

solution1
4 2011-11-05 00:08:54

solution2
2 2011-11-04 23:48:36

solution3
2 ACCPTED 2011-11-04 23:58:58

solution4
0 2011-11-04 23:54:49

solution5
0 2011-11-05 00:02:17

extract rows and filenames from multiple csv files

Question

5 answers

solution1 4 2011-11-05 00:08:54

solution2 2 2011-11-04 23:48:36

solution3 2 ACCPTED 2011-11-04 23:58:58

solution4 0 2011-11-04 23:54:49

solution5 0 2011-11-05 00:02:17

solution1
4 2011-11-05 00:08:54

solution2
2 2011-11-04 23:48:36

solution3
2 ACCPTED 2011-11-04 23:58:58

solution4
0 2011-11-04 23:54:49

solution5
0 2011-11-05 00:02:17