简体   繁体   中英

Sort a text file by multiple columns

I have a text file that have about 10 columns, 7 of those are date/time stamps (year, month, day, hour, minute, second, centisec). So the data looks roughly like...

User[TAB]System[TAB]Year[TAB]Month[TAB]Day ... centisec[TAB]Message

Sorry for the horrible formatting, but I hope this gives you the idea.

So if I wanted to sort the file by years, I could use this

sorted_lines = sorted(unsortedfile,key=lambda l: int(l.split('\t')[2]))

Take the unsorted file, split the lines by tabs, 3rd column, change it to an int and sort by that. I could do the same thing for any one column.

What I'm looking for is a better way to sort this by all the date/time columns. So sort by Year, then by month, then by day... etc

I can think of a few complicated ways of doing this (reading each line combining all the columns, sorting it... or doing a recursive sort by each column), but I'm hoping someone has a simpler, more pythonic, way to do the same thing.

You can use csv module to parse the file with delimiter='\\t' and apply sorted() on the reader object with a custom key function that parses the date to the datetime object:

import csv
from datetime import datetime
from pprint import pprint


def sort_by_datetime(line):
    return datetime.strptime('{0}-{1}-{2}'.format(*line[2:5]), '%Y-%m-%d')


with open('input.txt') as f:
    reader = csv.reader(f, delimiter='\t')
    pprint(sorted(reader, key=sort_by_datetime))

For the input.txt :

User1 System1 2013 1 31
User2 System2 2014 12 1
User3 System3 2012 12 31
User4 System4 2012 6 15
User5 System5 2014 1 1

it would print:

[['User4', 'System4', '2012', '6', '15'],
 ['User3', 'System3', '2012', '12', '31'],
 ['User1', 'System1', '2013', '1', '31'],
 ['User5', 'System5', '2014', '1', '1'],
 ['User2', 'System2', '2014', '12', '1']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM