I have a csv file that contains 6 columns.
I want to sort it by col #2 and then by col #3.
My current code is creating a blank file:
import csv
with open('original.csv', mode='rt') as f, open('sorted.csv', 'w') as final:
writer = csv.writer(final, delimiter='\t')
reader = csv.reader(f, delimiter=',')
_ = next(reader)
sorted1 = sorted(reader, key=lambda row: int(row[1]))
sorted2 = sorted(reader, key=lambda row: int(row[2]))
for row in sorted2:
writer.writerow(row)
What am I doing wrong?
The reason that your output file is empty is because
sorted2 = sorted(reader, key=lambda row: int(row[2]))
is trying to sort the data from reader
, but you've already read all the data in the previous sorting statement, so there's nothing left for the reader to read. However, you really don't want to re-sort the data from reader
, you want to re-sort the data in sorted1
, like this:
import csv
with open('original.csv', mode='rt') as f, open('sorted.csv', 'w') as final:
writer = csv.writer(final, delimiter='\t')
reader = csv.reader(f, delimiter=',')
_ = next(reader)
sorted1 = sorted(reader, key=lambda row: int(row[1]))
sorted2 = sorted(sorted1, key=lambda row: int(row[2]))
for row in sorted2:
writer.writerow(row)
OTOH, there's no need to do the sorting in two passes. You can do it in a single pass by changing the key function.
import csv
with open('original.csv', mode='rt') as f, open('sorted.csv', 'w') as final:
writer = csv.writer(final, delimiter='\t')
reader = csv.reader(f, delimiter=',')
_ = next(reader)
sorted2 = sorted(reader, key=lambda row: (int(row[1]), int(row[2])))
for row in sorted2:
writer.writerow(row)
That key function first compares items by their row[1]
values, and if those values are identical it then compares them by their row[2]
values. That may not give the ordering that you actually want. You may want to reverse the order of those tests:
key=lambda row: (int(row[2]), int(row[1]))
As Peter Wood mentions in the comments, Writer objects have a writerows
method that will write all the rows in one call. This is more efficient than writing the rows one by one in a for
loop.
BTW, there's no need to do this assignment:
_ = next(reader)
I guess it makes it clear that you're discarding the 1st row, but you could just write the call without performing an assignment:
next(reader)
With pandas you could achieve simple.
import pandas as pd
df = pd.read_csv('original.csv', delimiter='\t')
df = df.sort_values(['col1', 'col2'], ascending=[True, True]) # parameter ascending is applied to 'col1' and 'col2' respectively.
df.to_csv('sorted.csv')
lambda函数可以返回一个元组
sorted(reader, key=lambda row: (int(row[1]), int(row[2])))
try this
with open('original.csv',mode='r') as csvfile:
reader = csv.DictReader(csvfile, delimiter=";")
sortedlist = sorted(reader, key=lambda row:(int(row[1]), int(row[2])))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.