简体   繁体   English

将一维numpy数组逐行写入csv文件

[英]Write 1D numpy arrays to csv file row by row

I am trying to take multiple csv files (15 by 15 matrices), flatten them out into 1D matrices and then write them row by row into a new csv file using python. 我正在尝试获取多个csv文件(15 x 15矩阵),将它们展平为一维矩阵,然后使用python将它们逐行写入新的csv文件。

An example of an input csv file: 输入的csv文件的示例:

0,1,1,1,1,1,1,1,1,0,0,0,0,0,0
0,0,1,0,0,0,0,0,1,0,0,0,0,0,0
....
....

This is the approach I am currently using: 这是我目前正在使用的方法:

import pandas as pd
import glob
import numpy as np

path = r'.../Model_AMs'

allFiles = glob.glob(path + "/*.csv")

for file_ in allFiles:
    df = pd.read_csv(file_, header=None).values.flatten()

    np.savetxt('trainingdata.csv', df, newline=" ", delimiter=',')

However when I open trainingdata.csv it looks like this: 但是,当我打开trainingdata.csv它看起来像这样:

0.000000000000000000e+00 1.000000000000000000e+00 1.000000000000000000e+00 1.000000000000000000e+00 1.000000000000000000e+00 1.000000000000000000e+00

It is not delimiting the elements with ',' and also adds a lot of 0s rather than simply keeping the values as 1s or 0s. 它不是用','分隔元素,并且还会添加很多0,而不是简单地将值保持为1或0。

Any help would be appreciated. 任何帮助,将不胜感激。 Thanks 谢谢

At the moment you are writing one line at the time into your file (and by that overwrite the line before). 目前,您正在一次向文件中写入一行(并因此覆盖了之前的行)。 As pointed out by @hpaulj in the comments you could think about using a 2D array. 正如@hpaulj在评论中指出的那样,您可以考虑使用2D数组。

The following shows an example for the 2D case: 下面显示了2D情况的示例:

import numpy as np

df = np.arange(15*15)
df = df.reshape(15,15)
print df

np.savetxt('trainingdata.csv', df, fmt='%i', newline=" ", delimiter=',')

The fmt argument formats the values as integers as you wished. fmt参数根据需要将值格式化为整数。 If you really want to override the lines before or use a 1D array otherwise it can be saved as follows: 如果您确实想覆盖之前的行或使用一维数组,则可以将其保存如下:

import numpy as np

df = np.arange(15*15)
df = df.reshape(15,15)


for i in range(15):
    np.savetxt('trainingdata2.csv', [df[i]], fmt='%i', newline=" ", delimiter=',')

Note the [df[i]] which effectively makes a 2D array of the 1D array before writing it to the file. 注意[df[i]]可以在将1D数组写入文件之前有效地构成1D数组的2D数组。 This prevents the comma issue you described. 这样可以防止您描述的逗号问题。 The reason for that is, that by using [df[i]] you are telling np.savetxt that you want 1 row with 15 columns. 这样做的原因是,通过使用[df[i]]告诉np.savetxt您想要1行15列。 If you enter a simple 1D array with 15 elements, it is interpreted as 15 rows with 1 column each. 如果输入包含15个元素的简单一维数组,则将其解释为15行,每行1列。 You did not recognize that, because you set the value of newline = ' ' which causes them be in the same line of the file although they are actually multiple "lines" separated by spaces. 您没有意识到这一点,因为您设置了newline = ' '的值,这导致它们位于文件的同一行中,尽管它们实际上是由空格分隔的多个“行”。

The 2d array approach is neater, but here's a way to do it with pandas only: 2d数组方法更整洁,但这是仅对熊猫执行的一种方法:

import pandas as pd
import glob

path = r'.../Model_AMs'

allFiles = glob.glob(path + "/*.csv")

for file_ in allFiles:

    # transpose() is here to order values in same way as 
    # numpy's flatten(). astype() shouldn't be necessary,
    # but useful just in case pandas finds some floating
    # point values in your data
    df = pd.read_csv(file_, header=None).astype(int).transpose().melt()

    # writing in append mode
    pd.DataFrame(dict(zip(df.index, df.value)), index=[0]).to_csv(
      'trainingdata.csv', index=False, header=False, mode='a')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM