Write 1D numpy arrays to csv file row by row

Question

I am trying to take multiple csv files (15 by 15 matrices), flatten them out into 1D matrices and then write them row by row into a new csv file using python.

An example of an input csv file:

0,1,1,1,1,1,1,1,1,0,0,0,0,0,0
0,0,1,0,0,0,0,0,1,0,0,0,0,0,0
....
....

This is the approach I am currently using:

import pandas as pd
import glob
import numpy as np

path = r'.../Model_AMs'

allFiles = glob.glob(path + "/*.csv")

for file_ in allFiles:
    df = pd.read_csv(file_, header=None).values.flatten()

    np.savetxt('trainingdata.csv', df, newline=" ", delimiter=',')

However when I open trainingdata.csv it looks like this:

0.000000000000000000e+00 1.000000000000000000e+00 1.000000000000000000e+00 1.000000000000000000e+00 1.000000000000000000e+00 1.000000000000000000e+00

It is not delimiting the elements with ',' and also adds a lot of 0s rather than simply keeping the values as 1s or 0s.

Any help would be appreciated. Thanks

Answer 1

At the moment you are writing one line at the time into your file (and by that overwrite the line before). As pointed out by @hpaulj in the comments you could think about using a 2D array.

The following shows an example for the 2D case:

import numpy as np

df = np.arange(15*15)
df = df.reshape(15,15)
print df

np.savetxt('trainingdata.csv', df, fmt='%i', newline=" ", delimiter=',')

The fmt argument formats the values as integers as you wished. If you really want to override the lines before or use a 1D array otherwise it can be saved as follows:

import numpy as np

df = np.arange(15*15)
df = df.reshape(15,15)


for i in range(15):
    np.savetxt('trainingdata2.csv', [df[i]], fmt='%i', newline=" ", delimiter=',')

Note the [df[i]] which effectively makes a 2D array of the 1D array before writing it to the file. This prevents the comma issue you described. The reason for that is, that by using [df[i]] you are telling np.savetxt that you want 1 row with 15 columns. If you enter a simple 1D array with 15 elements, it is interpreted as 15 rows with 1 column each. You did not recognize that, because you set the value of newline = ' ' which causes them be in the same line of the file although they are actually multiple "lines" separated by spaces.

Answer 2

The 2d array approach is neater, but here's a way to do it with pandas only:

import pandas as pd
import glob

path = r'.../Model_AMs'

allFiles = glob.glob(path + "/*.csv")

for file_ in allFiles:

    # transpose() is here to order values in same way as 
    # numpy's flatten(). astype() shouldn't be necessary,
    # but useful just in case pandas finds some floating
    # point values in your data
    df = pd.read_csv(file_, header=None).astype(int).transpose().melt()

    # writing in append mode
    pd.DataFrame(dict(zip(df.index, df.value)), index=[0]).to_csv(
      'trainingdata.csv', index=False, header=False, mode='a')

Write 1D numpy arrays to csv file row by row

Question

2 answers

solution1
1 2018-01-30 22:14:31

solution2
0 ACCPTED 2018-01-30 22:48:44

Write 1D numpy arrays to csv file row by row

Question

2 answers

solution1 1 2018-01-30 22:14:31

solution2 0 ACCPTED 2018-01-30 22:48:44

solution1
1 2018-01-30 22:14:31

solution2
0 ACCPTED 2018-01-30 22:48:44