Python & Pandas: How can I skip creating intermediate data file when reading data?

Question

I have data files looks like this:

ABE200501.dat
ABE200502.dat
ABE200503.dat
...

So I first combine these files into all.dat , and do a little bit clean up

fout=open("all.dat","w")
for year in range(2000,2017):
    for month in range(1,13):
        try: 
            for line in open("ABE"+ str(year) +"%02d"%(month)+".dat"):
                fout.write(line.replace("[", " ").replace("]", " ").replace('"', " ").replace('`', " "))
        except: 
            pass
fout.close()

And I later on read the final file in pandas

df = pd.read_csv("all.dat", skipinitialspace=True, error_bad_lines=False, sep=' ',
                    names = ['stationID','time','vis','day_type','vis2','day_type2','dir','speed','dir_max','speed_max','visual_range', 'unknown'])

I want to know, if it is possible to save combine files in directly in RAM instead in my hard disk? This can save me a lot of unnecessary space.

Thanks!

Answer 1

The StringIO module lets you treat strings as files.

Example from the docs:

import StringIO

output = StringIO.StringIO()
output.write('First line.\n')
print >>output, 'Second line.'

# Retrieve file contents -- this will be
# 'First line.\nSecond line.\n'
contents = output.getvalue()

# Close object and discard memory buffer --
# .getvalue() will now raise an exception.
output.close()

For your own code:

fout = StringIO.StringIO()
# treat fout as a file handle like usual
# parse input files, writing to fout

file = fout.getvalue() # file is kind of a virtual file now 
                       # and can be "opened" by StringIO
fout.close()

# ...

using StringIO.StringIO(file) as fin:
    df = pd.read_csv(fin, skipinitialspace=True, error_bad_lines=False, sep=' ', names = ['stationID','time','vis','day_type','vis2','day_type2','dir','speed','dir_max','speed_max','visual_range', 'unknown'])

pandas accepts both pathname strings and file handles as input.

Python & Pandas: How can I skip creating intermediate data file when reading data?

Question

1 answers

solution1
1 ACCPTED 2017-03-22 06:13:41

Python & Pandas: How can I skip creating intermediate data file when reading data?

Question

1 answers

solution1 1 ACCPTED 2017-03-22 06:13:41

solution1
1 ACCPTED 2017-03-22 06:13:41