I am working on data having four columns and 912500 rows in csv format. I need to transpose the data in each column to 365 columns and 2500 rows in separate csv file. eg.
Col1 Col2 Col3 Col4
1 33 36 38
2 25 18 56
365 -4 -3 10
366 -11 20 35
367 12 18 27 . .
730 26 36 27 . .
. 912500 20 37 42
Desired output
Col1 Col2 Col3 Col4 Col5 .....Col 365
1 33 25...........................-4
2 -11 12 ....................... 26
3
4.............
5............ . .
2500............................
Please do advise me how to write a script for this? Any help will be highly appreciated.
Try using NumPy as suggested in the comments, but, just in case you want to code it yourself, here's one approach you could take:
You can read the file one line at a time
Split each line using the comma as the separator
Discard the "row count" (first element of the list you get as a result of the split operation). You will have to maintain your own row count.
csv.reader
will create an iterator that reads the csv row by row. You can then feed that into itertools.chain
which iterates each row in turn, outputing individual columns. Now that you have a stream of columns, you can group them into new rows of the size you want. There are several ways to rebuild those rows and I used itertools.groupby
in my example.
import itertools
import csv
def groupby_count(iterable, count):
counter = itertools.count()
for _, grp in itertools.groupby(iterable, lambda _: next(counter)//count):
yield tuple(grp)
def reshape_csv(in_filename, out_filename, colsize):
with open(in_filename) as infile, open(out_filename, 'w') as outfile:
reader = csv.reader(infile, delimiter=' ')
writer = csv.writer(outfile, delimiter=' ')
col_iter = itertools.chain.from_iterable(reader)
writer.writerows(groupby_count(col_iter, colsize))
And here's an example script to test. I used fewer columns, though:
import os
infn = "intest.csv"
outfn = "outtest.csv"
orig_colsize = 4
new_colsize = 15
# test input file
with open(infn, "w") as infp:
for i in range(32):
infp.write(' '.join('c{0:02d}_{1:02d}'.format(i,j) for j in range(4)) + '\n')
# remove stale output file
try:
os.remove(outfn)
except OSError:
pass
# run it and print
reshape_csv(infn, outfn, new_colsize)
print('------- test output ----------')
print(open(outfn).read())
What follows is tested against a fake data file, it worked OK for me but ymmv... please see the inline comments for a description of the workings
import csv
# we open the data file and put its content in data, that is a list of lists
with open('data.csv') as csvfile:
data = [row for row in csv.reader(csvfile)]
# the following idiom transpose a list of lists
transpose = zip(*data)
# I use Python 3, hence zip is a generator and I have to throw away using next()
# the first element, i.e., the column of the row numbers
next(transpose)
# I enumerate transpose, obtaininig the data column by column
for nc, column in enumerate(transpose):
# I prepare for writing to a csv file
with open('trans%d.csv'%nc, 'w') as outfile:
writer = csv.writer(outfile)
# here, we have an idiom, sort of..., please see
# http://stupidpythonideas.blogspot.it/2013/08/how-grouper-works.html
# for the reason why what we enumerate are the rows of your output file
for nr, row in enumerate(zip(*[iter(column)]*365)):
writer.writerow([nr+1,*row])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.