简体   繁体   中英

Sorting csv file by multiple columns by magnitude in python

I want to sort a csv file first by one column and then by another one.

I have tried some of the few approaches online to sort a csv file with multiple columns. The problem is that the sorting happens from left to right, so I get something like this 1, 10, 100, 101, 102.... when I want something like this 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11...

I've used this discussion: Sorting CSV in Python and this module: csvsort

I'd appreciate any reference or code.

It sounds like what you want is ordinal / numeric sorting, but what you're getting is alphabetic sorting. Alphabetic would be 1, 10, 100, 2, etc while ordinal sorting would get you 1, 2, 10, 100.

The data you're trying to sort is probably in string format when it's read in from the CSV, and you need to convert it to an int before calling Python's sort function.

You can do this by passing key=int as a parameter to the sort function, which will cause it to call int() on the members being sorted.

More information can be found here: How to sort a list numerically?

I think this should work,

import csv

reader = csv.reader(open("file.csv"))
sortedlist = sorted(reader, key=int(operator.itemgetter(3))) 
# 3 or 'n' depending upon which column you want to sort the data

with open("sorted_file.csv", 'wb') as f:
    csv.writer(f).writerows(sortedlist)

You just have to convert the key to int type when sorting.

Python is impressive !!

If you are using pandas library version > 0.25 then you can use sort_values

import pandas as pd
df = pd.read_csv('biostats.csv')
df

Name    Sex Age Height (in) Weight (lbs)
0   Alex    M   41  74  170
1   Page    F   31  67  135
2   Quin    M   29  71  176
3   Ruth    F   28  65  131
4   Ruth    F   59  75  131
5   Quin    M   19  55  46

df.sort_values(['Name', 'Sex'], ascending=[True, True])

Name    Sex Age Height (in) Weight (lbs)
0   Alex    M   41  74  170
1   Page    F   31  67  135
2   Quin    M   29  71  176
5   Quin    M   19  55  46
3   Ruth    F   28  65  131
4   Ruth    F   59  75  131

I have a csv file with the data like input.csv:

1285,375,2.0,3.5,2473
260,380,2.0,3.5,3780
2205,35,1.0,1.75,4829
245,25,1.0,1.75,5632
570,1520,1.0,1.75,8240
465,35,1.0,1.75,10287
3325,35,1.0,0.75,20788
2480,75,1.0,1.75,23589
0,15,4.0,7.0,48424

When using the operator.itemgetter :

import csv
import operator

inputfile="input.csv"

with open(inputfile, newline='') as csvfile:
    next(csvfile)
    outcsv = csv.reader(csvfile, delimiter=',', quotechar='|')
    sorted_csv = sorted(outcsv, key = operator.itemgetter(0))

    for eachline in sorted_csv:
        print(eachline)

the output I get is alphabetically sorted in the first column:

['0', '15', '4.0', '7.0', '48424']
['1285', '375', '2.0', '3.5', '2473']
['2205', '35', '1.0', '1.75', '4829']
['245', '25', '1.0', '1.75', '5632']
['2480', '75', '1.0', '1.75', '23589']
['260', '380', '2.0', '3.5', '3780']
['3325', '35', '1.0', '0.75', '20788']
['465', '35', '1.0', '1.75', '10287']
['570', '1520', '1.0', '1.75', '8240']

to sort the CSV file on the first column and ensure that the sorting is done using the numeric values instead. I did the following:

import csv
inputfile="input.csv"

with open(inputfile, newline='') as csvfile:
    next(csvfile)
    outcsv = csv.reader(csvfile, delimiter=',', quotechar='|')
    sorted_csv = sorted(outcsv, key = lambda start_time: int(start_time[0]))

    for eachline in sorted_csv:
        print(eachline)

Output is as expected.

['0', '15', '4.0', '7.0', '48424']
['245', '25', '1.0', '1.75', '5632']
['260', '380', '2.0', '3.5', '3780']
['465', '35', '1.0', '1.75', '10287']
['570', '1520', '1.0', '1.75', '8240']
['1285', '375', '2.0', '3.5', '2473']
['2205', '35', '1.0', '1.75', '4829']
['2480', '75', '1.0', '1.75', '23589']
['3325', '35', '1.0', '0.75', '20788']

To sort on any other column by magnitude (value of the number) simply replace the column number in the line:

    sorted_csv = sorted(outcsv, key = lambda start_time: int(start_time[0]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM