简体   繁体   中英

Write 1D numpy arrays to csv file row by row

I am trying to take multiple csv files (15 by 15 matrices), flatten them out into 1D matrices and then write them row by row into a new csv file using python.

An example of an input csv file:

0,1,1,1,1,1,1,1,1,0,0,0,0,0,0
0,0,1,0,0,0,0,0,1,0,0,0,0,0,0
....
....

This is the approach I am currently using:

import pandas as pd
import glob
import numpy as np

path = r'.../Model_AMs'

allFiles = glob.glob(path + "/*.csv")

for file_ in allFiles:
    df = pd.read_csv(file_, header=None).values.flatten()

    np.savetxt('trainingdata.csv', df, newline=" ", delimiter=',')

However when I open trainingdata.csv it looks like this:

0.000000000000000000e+00 1.000000000000000000e+00 1.000000000000000000e+00 1.000000000000000000e+00 1.000000000000000000e+00 1.000000000000000000e+00

It is not delimiting the elements with ',' and also adds a lot of 0s rather than simply keeping the values as 1s or 0s.

Any help would be appreciated. Thanks

At the moment you are writing one line at the time into your file (and by that overwrite the line before). As pointed out by @hpaulj in the comments you could think about using a 2D array.

The following shows an example for the 2D case:

import numpy as np

df = np.arange(15*15)
df = df.reshape(15,15)
print df

np.savetxt('trainingdata.csv', df, fmt='%i', newline=" ", delimiter=',')

The fmt argument formats the values as integers as you wished. If you really want to override the lines before or use a 1D array otherwise it can be saved as follows:

import numpy as np

df = np.arange(15*15)
df = df.reshape(15,15)


for i in range(15):
    np.savetxt('trainingdata2.csv', [df[i]], fmt='%i', newline=" ", delimiter=',')

Note the [df[i]] which effectively makes a 2D array of the 1D array before writing it to the file. This prevents the comma issue you described. The reason for that is, that by using [df[i]] you are telling np.savetxt that you want 1 row with 15 columns. If you enter a simple 1D array with 15 elements, it is interpreted as 15 rows with 1 column each. You did not recognize that, because you set the value of newline = ' ' which causes them be in the same line of the file although they are actually multiple "lines" separated by spaces.

The 2d array approach is neater, but here's a way to do it with pandas only:

import pandas as pd
import glob

path = r'.../Model_AMs'

allFiles = glob.glob(path + "/*.csv")

for file_ in allFiles:

    # transpose() is here to order values in same way as 
    # numpy's flatten(). astype() shouldn't be necessary,
    # but useful just in case pandas finds some floating
    # point values in your data
    df = pd.read_csv(file_, header=None).astype(int).transpose().melt()

    # writing in append mode
    pd.DataFrame(dict(zip(df.index, df.value)), index=[0]).to_csv(
      'trainingdata.csv', index=False, header=False, mode='a')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM