简体   繁体   中英

Converting csv rows to columns with Python

Could you help me, please ?

I need to convert :

  • one row with multiple columns from several files

TO

  • one file

WHERE

  • the number of columns equals to the number of files
  • and the number of rows equals to the number of columns from input files.

Input Files

File 1 : 32676;;90;5;22;...;4

File 2 : 255;35;88;17;;...;151

File 3 : 551;86;442;;78;...;20

Output file

32676;255;551

;35;86

90;88;442

5;17;

22;;78

...;...;...

4;151;20

Thanks a lot for your help !

My code :

path = 'D:\Users\mim\Desktop\SI\Test_cvs'
pathglobalcsv = 'D:\Users\mim\Desktop\SI'

#create a new file
globalfile = open(os.path.join(pathglobalcsv, 'global.csv'), 'w+')

#write filenames like column names
files = os.listdir(path)
globalfile.write(';'.join(files))
globalfile.write('\n')

#get all values
for filename in glob.glob(os.path.join(path, '*.csv')):
    csvfile = open(filename, 'r')
    textcsv = csv.reader(csvfile, delimiter=';')
    globalfile.write(zip(*textcsv))

I have an error : 

Traceback (most recent call last):
  File "C:\Users\mim\eclipse-workspace\test\csv_global.py", line 86, in <module>
    globalfile.write(zip(*textcsv))
TypeError: expected a string or other character buffer object

I found one solution...

globalfile = open(os.path.join(pathglobalcsv, 'global.csv'), 'wb')    
for filename in glob.glob(os.path.join(path, '*.csv')):
        csvfile = open(filename, 'r')
        with csvfile :
            textcsv = csv.reader(csvfile, delimiter=';')
            for row in textcsv:
                textlist = zip(list(row))
                column = pd.DataFrame(textlist)
                column.to_csv(globalfile, sep=';', header=False, index=False)

But actually the result is :

32676

90

5

22

...

4

255

35

88

17

...

151

How to start write values from the second file after 32676 ? Thanks a lot !

--- * . * ---

[CORRECT ANSWER] :

import csv
import glob
import os
import pandas as pd

path = 'D:\Users\mim\Desktop\SI\Test'
pathglobalcsv = 'D:\Users\mim\Desktop\SI'

#create a new file
globalfile = open(os.path.join(pathglobalcsv, 'global.csv'), 'w')

#write filenames like column names
files = os.listdir(path)
header = map(lambda files: files.replace(';', '\;'), files)

#write values from all files to one common csv file
outputfile = os.path.join(pathglobalcsv, 'global.csv')
outputcsv = open(outputfile, 'r')
outputtext = csv.reader(outputcsv)
listrow = list(outputtext)
outputDF = pd.DataFrame(listrow)

for filename in glob.glob(os.path.join(path, '*.csv')):
    csvfile = open(filename, 'r')
    with csvfile :
        textcsv = csv.reader(csvfile, delimiter=';')
        for row in textcsv:
            list_ = zip(list(row))
            column = pd.DataFrame(list_)
            outputDF = pd.concat([outputDF, column], axis=1)
outputDF.to_csv(globalfile, sep=';', header=header, index=False)

a bit of hints for how to use zip to merge data as well as transpose lists. it sounds like how to transpose a csv is you actual question. The answer to how to transpose a csv is to get it into a list of list (via for example the csv module) and then transpose that and write back to file (if wanted).

row1 = [1,2,3]

row2 = ['a', 'b', 'c']

list(zip(row1, row2))
Out[45]: [(1, 'a'), (2, 'b'), (3, 'c')]

z = list(zip(row1, row2))

list(zip(*z))
Out[47]: [(1, 2, 3), ('a', 'b', 'c')]

y = list(zip(*z))

y
Out[49]: [(1, 2, 3), ('a', 'b', 'c')]

list(zip(*y))
Out[50]: [(1, 'a'), (2, 'b'), (3, 'c')]

or if you have numpy or pandas installed, both of those will do the job in max 3 lines of code with the workflow read_file/transpose_matrix/write_transposed_to_file

So based on your code, I would read all files put them in memory and then do the transposed writing. I think if you change this portion it will do it (I did not test it myself).

#write filenames like column names
files = os.listdir(path)
#globalfile.write(';'.join(files))
#globalfile.write('\n')

file_rows = [files] # adjusted so that its a list in list

#get all values
for filename in glob.glob(os.path.join(path, '*.csv')):
    tmp_rows = []
    with open(filename, 'r') as csvfile:
        textcsv = csv.reader(csvfile, delimiter=';')
        for row in textcsv:
            tmp_rows += [row] # adjusted for list in lists
    file_rows += tmp_rows
with open('transposed.csv') as f:
    gf = csv.writer(f)
    gf.writerows(zip(*file_rows))

You will get funny results if you don't strictly have 1 row per original file.

Update: I made a small example that does work.

files = list('abcd')
file_rows = [files]
for filename in [range(i, i+4) for i in range(0, 12, 4)]:
    tmp_rows = []
    fake_csv = [list(filename)]
    for row in fake_csv:
        tmp_rows += [row] # change to [row, row] to see what happens
                          # in case of multiple rows in original csv
    file_rows += tmp_rows
transposed = list(zip(*file_rows))
print(transposed)

after doing that test code I adjusted original code a bit to make it list in lists, that's the only thing. So if you after that change still get funny results it is now because you don't have uniform input data, in that case you need to decide how to deal with that. zip for example will silently only output the length of the shortest list for all original rows. To fix that you need to codewise add to so that all lists have the same length as the longest row.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM