简体   繁体   中英

Using Python to read .CSV files and then compare the columns/rows

I am currently trying to develop a program that reads data from a text file, and returns the pair of employees that has worked the most time together. I decided to it in a .CSV format, as that is still a plain text format, but seperated with comas.

Example:

EmpID,ProjectID,DateFrom,DateTo
1,A,2014-11-01,2015-05-01
2,B,2013-12-06,2014-10-06
2,C,2014-01-07,2016-03-07
3,B,2015-06-04,2017-09-04
5,C,2014-10-01,2015-12-01
1,A,2013-03-07,2015-11-07
2,C,2015-07-09,2019-01-19
3,B,2013-11-13,2014-03-13
4,C,2016-02-14,NULL
5,D,2014-03-15,2015-11-09

Now, I learned how to read .CSV files, but I am not sure on what is the best way for the thing after (the comparing of values, etc). For now, I decided that this is the cleanest option:

import csv

with open('data.csv', 'r') as f:
  reader = csv.reader(f)
  your_list = list(reader)

print(your_list)

I just want a piece of advice, if the best way to go would be with comparing indexes of the list. I was also thinking about dictionaries, but I am not sure, hence the reason I am asking here :) And SQL is not an option, even though it would be so easy with it. Sorry if this is a bad question, but I am currently learning Python and this is quite an important task for me. Thanks!

As I understand from what you wrote, I think what you need is something like this:

#read csv, and split on "," the line
csv_file = csv.reader(open('data.csv', "rb"), delimiter=",")

for item in csv_file:
#do your work

maybe you can look at Pandas too if you have large Data. It ll be more efficient to work with Pandas in that Case

You can use datetime package to check total time elapsed. Create a list of people in the csv file, then sort the list based on the elapsed time. for the first 8 rows of csv file (because NULL is undefined!):

1,A,2014-11-01,2015-05-01
2,B,2013-12-06,2014-10-06
2,C,2014-01-07,2016-03-07
3,B,2015-06-04,2017-09-04
5,C,2014-10-01,2015-12-01
1,A,2013-03-07,2015-11-07

You can use this:

from datetime import datetime
with open('file.txt', 'r') as file:
    my_list = list()
    for line in file:      
        list_ = line.split(',')
        dt1 = datetime.strptime(list_[2], '%Y-%M-%d')
        dt2 = datetime.strptime(list_[3][:10], '%Y-%M-%d')
        my_list.append(list_[:2] + [dt2-dt1])
        my_list.sort(key=lambda x: x[2])
print(my_list)

output:

[['3', 'B', datetime.timedelta(days=364, seconds=85920)], ['1', 'A', datetime.timedelta(days=364, seconds=86040)], ['2', 'B', datetime.timedelta(days=364, seconds=86280)], ['5', 'C', datetime.timedelta(days=365, seconds=120)], ['2', 'C', datetime.timedelta(days=730, seconds=120)], ['1', 'A', datetime.timedelta(days=730, seconds=480)], ['3', 'B', datetime.timedelta(days=731, seconds=180)], ['2', 'C', datetime.timedelta(days=1470, seconds=86040)]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM