简体   繁体   中英

how to sort a csv file by two columns in python?

I have a csv file that contains 6 columns.

I want to sort it by col #2 and then by col #3.

My current code is creating a blank file:

import csv
with open('original.csv', mode='rt') as f, open('sorted.csv', 'w') as final:
        writer = csv.writer(final, delimiter='\t')
        reader = csv.reader(f, delimiter=',')
        _ = next(reader)
        sorted1 = sorted(reader, key=lambda row: int(row[1]))
        sorted2 = sorted(reader, key=lambda row: int(row[2]))
        for row in sorted2:
            writer.writerow(row)

What am I doing wrong?

The reason that your output file is empty is because

sorted2 = sorted(reader, key=lambda row: int(row[2]))

is trying to sort the data from reader , but you've already read all the data in the previous sorting statement, so there's nothing left for the reader to read. However, you really don't want to re-sort the data from reader , you want to re-sort the data in sorted1 , like this:

import csv

with open('original.csv', mode='rt') as f, open('sorted.csv', 'w') as final:
    writer = csv.writer(final, delimiter='\t')
    reader = csv.reader(f, delimiter=',')
    _ = next(reader)
    sorted1 = sorted(reader, key=lambda row: int(row[1]))
    sorted2 = sorted(sorted1, key=lambda row: int(row[2]))
    for row in sorted2:
        writer.writerow(row)

OTOH, there's no need to do the sorting in two passes. You can do it in a single pass by changing the key function.

import csv

with open('original.csv', mode='rt') as f, open('sorted.csv', 'w') as final:
    writer = csv.writer(final, delimiter='\t')
    reader = csv.reader(f, delimiter=',')
    _ = next(reader)
    sorted2 = sorted(reader, key=lambda row: (int(row[1]), int(row[2])))        
    for row in sorted2:
        writer.writerow(row)

That key function first compares items by their row[1] values, and if those values are identical it then compares them by their row[2] values. That may not give the ordering that you actually want. You may want to reverse the order of those tests:

key=lambda row: (int(row[2]), int(row[1])) 

As Peter Wood mentions in the comments, Writer objects have a writerows method that will write all the rows in one call. This is more efficient than writing the rows one by one in a for loop.

BTW, there's no need to do this assignment:

_ = next(reader)

I guess it makes it clear that you're discarding the 1st row, but you could just write the call without performing an assignment:

next(reader)

With pandas you could achieve simple.

import pandas as pd

df = pd.read_csv('original.csv', delimiter='\t')

df = df.sort_values(['col1', 'col2'], ascending=[True, True]) # parameter ascending is applied to 'col1' and 'col2' respectively.

df.to_csv('sorted.csv')

doc to pandas read_csv

doc to pandas sort

lambda函数可以返回一个元组

sorted(reader, key=lambda row: (int(row[1]), int(row[2])))

try this

 with open('original.csv',mode='r') as csvfile:
        reader = csv.DictReader(csvfile, delimiter=";")
        sortedlist = sorted(reader, key=lambda row:(int(row[1]), int(row[2])))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM