Could you help me, please ?
I need to convert :
TO
WHERE
Input Files
File 1 : 32676;;90;5;22;...;4
File 2 : 255;35;88;17;;...;151
File 3 : 551;86;442;;78;...;20
Output file
32676;255;551
;35;86
90;88;442
5;17;
22;;78
...;...;...
4;151;20
Thanks a lot for your help !
My code :
path = 'D:\Users\mim\Desktop\SI\Test_cvs'
pathglobalcsv = 'D:\Users\mim\Desktop\SI'
#create a new file
globalfile = open(os.path.join(pathglobalcsv, 'global.csv'), 'w+')
#write filenames like column names
files = os.listdir(path)
globalfile.write(';'.join(files))
globalfile.write('\n')
#get all values
for filename in glob.glob(os.path.join(path, '*.csv')):
csvfile = open(filename, 'r')
textcsv = csv.reader(csvfile, delimiter=';')
globalfile.write(zip(*textcsv))
I have an error :
Traceback (most recent call last):
File "C:\Users\mim\eclipse-workspace\test\csv_global.py", line 86, in <module>
globalfile.write(zip(*textcsv))
TypeError: expected a string or other character buffer object
I found one solution...
globalfile = open(os.path.join(pathglobalcsv, 'global.csv'), 'wb')
for filename in glob.glob(os.path.join(path, '*.csv')):
csvfile = open(filename, 'r')
with csvfile :
textcsv = csv.reader(csvfile, delimiter=';')
for row in textcsv:
textlist = zip(list(row))
column = pd.DataFrame(textlist)
column.to_csv(globalfile, sep=';', header=False, index=False)
But actually the result is :
32676
90
5
22
...
4
255
35
88
17
...
151
How to start write values from the second file after 32676 ? Thanks a lot !
--- * . * ---
[CORRECT ANSWER] :
import csv
import glob
import os
import pandas as pd
path = 'D:\Users\mim\Desktop\SI\Test'
pathglobalcsv = 'D:\Users\mim\Desktop\SI'
#create a new file
globalfile = open(os.path.join(pathglobalcsv, 'global.csv'), 'w')
#write filenames like column names
files = os.listdir(path)
header = map(lambda files: files.replace(';', '\;'), files)
#write values from all files to one common csv file
outputfile = os.path.join(pathglobalcsv, 'global.csv')
outputcsv = open(outputfile, 'r')
outputtext = csv.reader(outputcsv)
listrow = list(outputtext)
outputDF = pd.DataFrame(listrow)
for filename in glob.glob(os.path.join(path, '*.csv')):
csvfile = open(filename, 'r')
with csvfile :
textcsv = csv.reader(csvfile, delimiter=';')
for row in textcsv:
list_ = zip(list(row))
column = pd.DataFrame(list_)
outputDF = pd.concat([outputDF, column], axis=1)
outputDF.to_csv(globalfile, sep=';', header=header, index=False)
a bit of hints for how to use zip to merge data as well as transpose lists. it sounds like how to transpose a csv is you actual question. The answer to how to transpose a csv is to get it into a list of list (via for example the csv module) and then transpose that and write back to file (if wanted).
row1 = [1,2,3]
row2 = ['a', 'b', 'c']
list(zip(row1, row2))
Out[45]: [(1, 'a'), (2, 'b'), (3, 'c')]
z = list(zip(row1, row2))
list(zip(*z))
Out[47]: [(1, 2, 3), ('a', 'b', 'c')]
y = list(zip(*z))
y
Out[49]: [(1, 2, 3), ('a', 'b', 'c')]
list(zip(*y))
Out[50]: [(1, 'a'), (2, 'b'), (3, 'c')]
or if you have numpy or pandas installed, both of those will do the job in max 3 lines of code with the workflow read_file/transpose_matrix/write_transposed_to_file
So based on your code, I would read all files put them in memory and then do the transposed writing. I think if you change this portion it will do it (I did not test it myself).
#write filenames like column names
files = os.listdir(path)
#globalfile.write(';'.join(files))
#globalfile.write('\n')
file_rows = [files] # adjusted so that its a list in list
#get all values
for filename in glob.glob(os.path.join(path, '*.csv')):
tmp_rows = []
with open(filename, 'r') as csvfile:
textcsv = csv.reader(csvfile, delimiter=';')
for row in textcsv:
tmp_rows += [row] # adjusted for list in lists
file_rows += tmp_rows
with open('transposed.csv') as f:
gf = csv.writer(f)
gf.writerows(zip(*file_rows))
You will get funny results if you don't strictly have 1 row per original file.
Update: I made a small example that does work.
files = list('abcd')
file_rows = [files]
for filename in [range(i, i+4) for i in range(0, 12, 4)]:
tmp_rows = []
fake_csv = [list(filename)]
for row in fake_csv:
tmp_rows += [row] # change to [row, row] to see what happens
# in case of multiple rows in original csv
file_rows += tmp_rows
transposed = list(zip(*file_rows))
print(transposed)
after doing that test code I adjusted original code a bit to make it list in lists, that's the only thing. So if you after that change still get funny results it is now because you don't have uniform input data, in that case you need to decide how to deal with that. zip for example will silently only output the length of the shortest list for all original rows. To fix that you need to codewise add to so that all lists have the same length as the longest row.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.