简体   繁体   中英

Read specific date range from csv file using Python

I've got a csv file with over 60 million records in it in the format,

2013-07-23 17:04:34, some data, some more data   

I want to write a Python script that allows a user to put in a "To" and a "From" date in the format "2013-04-23" and "2013-04-25". I then want to search for all records within that range and display them.

I'm using Python 2.7 on a windows 7 machine (developing using Eclipse) but when complete, this script will run on a Linux Red Hat server.

So, a scaled down version of what I have is:

if __name__ == '__main__':
    from_date = raw_input('\nEnter FROM Date (e.g. 2013-11-29) :')
    from_date += ' 00:00:00'
    print('From date: = ' + from_date)
    to_date = raw_input('\nEnter TO Date (e.g. 2013-11-30) :')
    to_date += ' 23:59:59'

    in_file = './file.csv'
    for line in in_file:
        fields = line.split(',')
        found_from_date = re.match(from_date, fields[0])
        if found_from_date:
            found_to_date = re.match(to_date, fields[0])
            if found_to_date:
                print(line)

    in_file.close()

As you can see I'm currently use regex but that of course means that I only pick up exact matches. I can of course write some code that splits up each date field and matches each individual field but I was hoping that there's some date range function I can use.

I did a bit of Googleing and came across something called pandas , but before downloading and learning I just wanted to make sure there isn't something more standard / easier and that can be updated using the Red Hat package manager.

Any advice would be greatly appreciated.

Thanks in advance.

The datetime module is your friend here, seeing as how it has built in capabilities to compare dates. I can't recall if there's a method that takes in a preformatted string and converts it to a datetime.date , but it's simple enough to parse that bit out:

import datetime

if __name__ == '__main__':
    from_raw = raw_input('\nEnter FROM Date (e.g. 2013-11-29) :')
    from_date = datetime.date(*map(int, from_raw.split('-')))
    print 'From date: = ' + str(from_date)
    to_raw = raw_input('\nEnter TO Date (e.g. 2013-11-30) :')
    to_date = datetime.date(*map(int, to_raw.split('-')))

    in_file = './file.csv'
    for line in in_file:
        fields = line.split(',')
        found_date = datetime.date(*map(int, fields[0].split(' ')[0].split('-')))
        if from_date <= found_date <= to_date:
            print line

    in_file.close()

Have a look at dateutil. http://labix.org/python-dateutil Perhaps the rrule.between(after, before, inc=False) is what you're after?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM