简体   繁体   English

使用 python 或 Matlab 将数据从 csv 文件转换为 csv

[英]Transposing the data from csv file to csv using python or Matlab

I am working on data having four columns and 912500 rows in csv format.我正在处理 csv 格式的具有四列和 912500 行的数据。 I need to transpose the data in each column to 365 columns and 2500 rows in separate csv file.我需要将每列中的数据转换为单独的 csv 文件中的 365 列和 2500 行。 eg.例如。

Col1 Col2 Col3 Col4 Col1 Col2 Col3 Col4

1 33 36 38 1 33 36 38

2 25 18 56 2 25 18 56

365 -4 -3 10 365 -4 -3 10

366 -11 20 35 366 -11 20 35

367 12 18 27 . 367 12 18 27 . . .

730 26 36 27 . 730 26 36 27。 . .

. . 912500 20 37 42 912500 20 37 42

Desired output期望输出

Col1  Col2 Col3  Col4 Col5 .....Col 365 

1 33 25...........................-4 1 33 25................................-4

2 -11 12 ....................... 26 2 -11 12 ..................... 26

3 3

4............. 4………………

5............ . 5……………… . .

2500............................ 2500 ......................

Please do advise me how to write a script for this?请告诉我如何为此编写脚本? Any help will be highly appreciated.任何帮助将不胜感激。

Try using NumPy as suggested in the comments, but, just in case you want to code it yourself, here's one approach you could take:尝试按照评论中的建议使用 NumPy,但是,如果您想自己编写代码,您可以采用以下一种方法:

  • You can read the file one line at a time您可以一次读取一行文件

  • Split each line using the comma as the separator使用逗号作为分隔符分割每一行

  • Discard the "row count" (first element of the list you get as a result of the split operation).丢弃“行数”(由于拆分操作而获得的列表的第一个元素)。 You will have to maintain your own row count.您必须维护自己的行数。

  • Copy the remaining elements to another list until you have 365 elements (including row count)将剩余元素复制到另一个列表,直到您有 365 个元素(包括行数)
  • Write this list as CSV to the output file.将此列表作为 CSV 写入输出文件。 You can use Python's built-in CSV writer ( https://docs.python.org/2/library/csv.html )您可以使用 Python 的内置 CSV 编写器 ( https://docs.python.org/2/library/csv.html )
  • Repeat until the whole input file has been processed.重复直到处理完整个输入文件。

csv.reader will create an iterator that reads the csv row by row. csv.reader将创建一个迭代器,逐行读取 csv。 You can then feed that into itertools.chain which iterates each row in turn, outputing individual columns.然后你可以将它输入到itertools.chain ,它依次迭代每一行,输出单独的列。 Now that you have a stream of columns, you can group them into new rows of the size you want.现在您有了一个列流,您可以将它们分组为所需大小的新行。 There are several ways to rebuild those rows and I used itertools.groupby in my example.有几种方法可以重建这些行,我在示例中使用了itertools.groupby

import itertools
import csv

def groupby_count(iterable, count):
    counter = itertools.count()
    for _, grp in itertools.groupby(iterable, lambda _: next(counter)//count):
        yield tuple(grp)

def reshape_csv(in_filename, out_filename, colsize):
    with open(in_filename) as infile, open(out_filename, 'w') as outfile:
        reader = csv.reader(infile, delimiter=' ')
        writer = csv.writer(outfile, delimiter=' ')
        col_iter = itertools.chain.from_iterable(reader)
        writer.writerows(groupby_count(col_iter, colsize))

And here's an example script to test.这是一个要测试的示例脚本。 I used fewer columns, though:不过,我使用了较少的列:

import os
infn = "intest.csv"
outfn = "outtest.csv"
orig_colsize = 4
new_colsize = 15

# test input file
with open(infn, "w") as infp:
    for i in range(32):
        infp.write(' '.join('c{0:02d}_{1:02d}'.format(i,j) for j in range(4)) + '\n')

# remove stale output file
try:
    os.remove(outfn)
except OSError:
    pass

# run it and print
reshape_csv(infn, outfn, new_colsize)
print('------- test output ----------')
print(open(outfn).read())

What follows is tested against a fake data file, it worked OK for me but ymmv... please see the inline comments for a description of the workings以下内容针对虚假数据文件进行了测试,它对我来说工作正常,但是 ymmv...请参阅内嵌注释以了解工作原理

import csv

# we open the data file and put its content in data, that is a list of lists
with open('data.csv') as csvfile:
    data = [row for row in csv.reader(csvfile)]

# the following idiom transpose a list of lists
transpose = zip(*data)

# I use Python 3, hence zip is a generator and I have to throw away using next()
# the first element, i.e., the column of the row numbers
next(transpose)

# I enumerate transpose, obtaininig the data column by column    
for nc, column in enumerate(transpose):

    # I prepare for writing to a csv file
    with open('trans%d.csv'%nc, 'w') as outfile:
        writer = csv.writer(outfile)

        # here, we have an idiom, sort of..., please see
        #   http://stupidpythonideas.blogspot.it/2013/08/how-grouper-works.html
        # for the reason why what we enumerate are the rows of your output file
        for nr, row in enumerate(zip(*[iter(column)]*365)):
            writer.writerow([nr+1,*row])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM