I am trying to take multiple csv files (15 by 15 matrices), flatten them out into 1D matrices and then write them row by row into a new csv file using python.
An example of an input csv file:
0,1,1,1,1,1,1,1,1,0,0,0,0,0,0
0,0,1,0,0,0,0,0,1,0,0,0,0,0,0
....
....
This is the approach I am currently using:
import pandas as pd
import glob
import numpy as np
path = r'.../Model_AMs'
allFiles = glob.glob(path + "/*.csv")
for file_ in allFiles:
df = pd.read_csv(file_, header=None).values.flatten()
np.savetxt('trainingdata.csv', df, newline=" ", delimiter=',')
However when I open trainingdata.csv
it looks like this:
0.000000000000000000e+00 1.000000000000000000e+00 1.000000000000000000e+00 1.000000000000000000e+00 1.000000000000000000e+00 1.000000000000000000e+00
It is not delimiting the elements with ',' and also adds a lot of 0s rather than simply keeping the values as 1s or 0s.
Any help would be appreciated. Thanks
At the moment you are writing one line at the time into your file (and by that overwrite the line before). As pointed out by @hpaulj in the comments you could think about using a 2D array.
The following shows an example for the 2D case:
import numpy as np
df = np.arange(15*15)
df = df.reshape(15,15)
print df
np.savetxt('trainingdata.csv', df, fmt='%i', newline=" ", delimiter=',')
The fmt
argument formats the values as integers as you wished. If you really want to override the lines before or use a 1D array otherwise it can be saved as follows:
import numpy as np
df = np.arange(15*15)
df = df.reshape(15,15)
for i in range(15):
np.savetxt('trainingdata2.csv', [df[i]], fmt='%i', newline=" ", delimiter=',')
Note the [df[i]]
which effectively makes a 2D array of the 1D array before writing it to the file. This prevents the comma issue you described. The reason for that is, that by using [df[i]]
you are telling np.savetxt
that you want 1 row with 15 columns. If you enter a simple 1D array with 15 elements, it is interpreted as 15 rows with 1 column each. You did not recognize that, because you set the value of newline = ' '
which causes them be in the same line of the file although they are actually multiple "lines" separated by spaces.
The 2d array approach is neater, but here's a way to do it with pandas only:
import pandas as pd
import glob
path = r'.../Model_AMs'
allFiles = glob.glob(path + "/*.csv")
for file_ in allFiles:
# transpose() is here to order values in same way as
# numpy's flatten(). astype() shouldn't be necessary,
# but useful just in case pandas finds some floating
# point values in your data
df = pd.read_csv(file_, header=None).astype(int).transpose().melt()
# writing in append mode
pd.DataFrame(dict(zip(df.index, df.value)), index=[0]).to_csv(
'trainingdata.csv', index=False, header=False, mode='a')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.