Using Python to read .CSV files and then compare the columns/rows

Question

I am currently trying to develop a program that reads data from a text file, and returns the pair of employees that has worked the most time together. I decided to it in a .CSV format, as that is still a plain text format, but seperated with comas.

Example:

EmpID,ProjectID,DateFrom,DateTo
1,A,2014-11-01,2015-05-01
2,B,2013-12-06,2014-10-06
2,C,2014-01-07,2016-03-07
3,B,2015-06-04,2017-09-04
5,C,2014-10-01,2015-12-01
1,A,2013-03-07,2015-11-07
2,C,2015-07-09,2019-01-19
3,B,2013-11-13,2014-03-13
4,C,2016-02-14,NULL
5,D,2014-03-15,2015-11-09

Now, I learned how to read .CSV files, but I am not sure on what is the best way for the thing after (the comparing of values, etc). For now, I decided that this is the cleanest option:

import csv

with open('data.csv', 'r') as f:
  reader = csv.reader(f)
  your_list = list(reader)

print(your_list)

I just want a piece of advice, if the best way to go would be with comparing indexes of the list. I was also thinking about dictionaries, but I am not sure, hence the reason I am asking here :) And SQL is not an option, even though it would be so easy with it. Sorry if this is a bad question, but I am currently learning Python and this is quite an important task for me. Thanks!

Answer 1

As I understand from what you wrote, I think what you need is something like this:

#read csv, and split on "," the line
csv_file = csv.reader(open('data.csv', "rb"), delimiter=",")

for item in csv_file:
#do your work

maybe you can look at Pandas too if you have large Data. It ll be more efficient to work with Pandas in that Case

Answer 2

You can use datetime package to check total time elapsed. Create a list of people in the csv file, then sort the list based on the elapsed time. for the first 8 rows of csv file (because NULL is undefined!):

1,A,2014-11-01,2015-05-01
2,B,2013-12-06,2014-10-06
2,C,2014-01-07,2016-03-07
3,B,2015-06-04,2017-09-04
5,C,2014-10-01,2015-12-01
1,A,2013-03-07,2015-11-07

You can use this:

from datetime import datetime
with open('file.txt', 'r') as file:
    my_list = list()
    for line in file:      
        list_ = line.split(',')
        dt1 = datetime.strptime(list_[2], '%Y-%M-%d')
        dt2 = datetime.strptime(list_[3][:10], '%Y-%M-%d')
        my_list.append(list_[:2] + [dt2-dt1])
        my_list.sort(key=lambda x: x[2])
print(my_list)

output:

[['3', 'B', datetime.timedelta(days=364, seconds=85920)], ['1', 'A', datetime.timedelta(days=364, seconds=86040)], ['2', 'B', datetime.timedelta(days=364, seconds=86280)], ['5', 'C', datetime.timedelta(days=365, seconds=120)], ['2', 'C', datetime.timedelta(days=730, seconds=120)], ['1', 'A', datetime.timedelta(days=730, seconds=480)], ['3', 'B', datetime.timedelta(days=731, seconds=180)], ['2', 'C', datetime.timedelta(days=1470, seconds=86040)]]

Using Python to read .CSV files and then compare the columns/rows

Question

2 answers

solution1
1 2019-05-29 15:47:15

solution2
1 2019-05-29 16:37:57

Using Python to read .CSV files and then compare the columns/rows

Question

2 answers

solution1 1 2019-05-29 15:47:15

solution2 1 2019-05-29 16:37:57

solution1
1 2019-05-29 15:47:15

solution2
1 2019-05-29 16:37:57