简体   繁体   中英

Transposing the data from csv file to csv using python or Matlab

I am working on data having four columns and 912500 rows in csv format. I need to transpose the data in each column to 365 columns and 2500 rows in separate csv file. eg.

Col1 Col2 Col3 Col4

1 33 36 38

2 25 18 56

365 -4 -3 10

366 -11 20 35

367 12 18 27 . .

730 26 36 27 . .

. 912500 20 37 42

Desired output

Col1  Col2 Col3  Col4 Col5 .....Col 365 

1 33 25...........................-4

2 -11 12 ....................... 26

3

4.............

5............ . .

2500............................

Please do advise me how to write a script for this? Any help will be highly appreciated.

Try using NumPy as suggested in the comments, but, just in case you want to code it yourself, here's one approach you could take:

  • You can read the file one line at a time

  • Split each line using the comma as the separator

  • Discard the "row count" (first element of the list you get as a result of the split operation). You will have to maintain your own row count.

  • Copy the remaining elements to another list until you have 365 elements (including row count)
  • Write this list as CSV to the output file. You can use Python's built-in CSV writer ( https://docs.python.org/2/library/csv.html )
  • Repeat until the whole input file has been processed.

csv.reader will create an iterator that reads the csv row by row. You can then feed that into itertools.chain which iterates each row in turn, outputing individual columns. Now that you have a stream of columns, you can group them into new rows of the size you want. There are several ways to rebuild those rows and I used itertools.groupby in my example.

import itertools
import csv

def groupby_count(iterable, count):
    counter = itertools.count()
    for _, grp in itertools.groupby(iterable, lambda _: next(counter)//count):
        yield tuple(grp)

def reshape_csv(in_filename, out_filename, colsize):
    with open(in_filename) as infile, open(out_filename, 'w') as outfile:
        reader = csv.reader(infile, delimiter=' ')
        writer = csv.writer(outfile, delimiter=' ')
        col_iter = itertools.chain.from_iterable(reader)
        writer.writerows(groupby_count(col_iter, colsize))

And here's an example script to test. I used fewer columns, though:

import os
infn = "intest.csv"
outfn = "outtest.csv"
orig_colsize = 4
new_colsize = 15

# test input file
with open(infn, "w") as infp:
    for i in range(32):
        infp.write(' '.join('c{0:02d}_{1:02d}'.format(i,j) for j in range(4)) + '\n')

# remove stale output file
try:
    os.remove(outfn)
except OSError:
    pass

# run it and print
reshape_csv(infn, outfn, new_colsize)
print('------- test output ----------')
print(open(outfn).read())

What follows is tested against a fake data file, it worked OK for me but ymmv... please see the inline comments for a description of the workings

import csv

# we open the data file and put its content in data, that is a list of lists
with open('data.csv') as csvfile:
    data = [row for row in csv.reader(csvfile)]

# the following idiom transpose a list of lists
transpose = zip(*data)

# I use Python 3, hence zip is a generator and I have to throw away using next()
# the first element, i.e., the column of the row numbers
next(transpose)

# I enumerate transpose, obtaininig the data column by column    
for nc, column in enumerate(transpose):

    # I prepare for writing to a csv file
    with open('trans%d.csv'%nc, 'w') as outfile:
        writer = csv.writer(outfile)

        # here, we have an idiom, sort of..., please see
        #   http://stupidpythonideas.blogspot.it/2013/08/how-grouper-works.html
        # for the reason why what we enumerate are the rows of your output file
        for nr, row in enumerate(zip(*[iter(column)]*365)):
            writer.writerow([nr+1,*row])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM