简体   繁体   中英

How sort by a column different csv files and merge them into one, using Python?

I have a lot of csv file made by 3 columns like this:

fac simile of files: file_1, file_4, file_5, file_7, etc 
(All the same file name, != only the final numbers at the end. Them are not consecutive tho as in the 
example)


the inside

['357', '29384', '0.0031545741324921135']
['357', '29389', '0.0031545741324921135']
['357', '29526', '0.0368574903844921735']
['357', '35516', '0.0036775741324564665']
['357', '35551', '0.0023554341325646453']
['357', '35639', '0.0064467781324766535']
['357', '36238', '0.0067543874132467543']
['357', '37162', '0.0031545746577921135']

Let's name the 3 columns [a,b,c]. I'd like to sort them by c, so the last column. I have to read all the files and sort all the content ina huge one. I can use a pickle for example.

My first idea was:

import csv
from operator import itemgetter
fn = 1
# N as the max number in the really last file
while fn < N:
   newfile = open("file_{fn}.csv","r")
   reader = csv.reader(newfile)

   file = open("BigSortedFile.csv","w")

   for line in sorted(reader, key=itemgetter(2)):
   file.write(line)

   fn = fn +1
file.close()

#after the loop I think I have to sort again the BigSortedFile.

But it's not working because I need a string, not a line. How can I do the whole process?

To sort all lines you need to read them all into one datastructure, then write them again.

The csv module needs you to open files with newline="" to work properly. When you use a csv.reader to read, you can also use a csv.writer to write your data:

import csv
from operator import itemgetter

fn = 1  # first file has number 1 in filename
N = 42  # last numer in file-names is 42

data = []
while fn < N:
   with open("file_{fn}.csv", "r", newline="") as newfile:
       reader = csv.reader(newfile)
       data.extend(list(reader))

data.sort(key=itemgetter(2))

with open("BigSortedFile.csv", "w", newline="") as bf:
    writer = csv.writer(bf)
    writer.writerows(data)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM