简体   繁体   English

使用Python读取.CSV文件,然后比较列/行

[英]Using Python to read .CSV files and then compare the columns/rows

I am currently trying to develop a program that reads data from a text file, and returns the pair of employees that has worked the most time together. 我目前正在尝试开发一个程序,该程序从文本文件中读取数据,并返回工作时间最长的一对员工。 I decided to it in a .CSV format, as that is still a plain text format, but seperated with comas. 我决定采用.CSV格式,因为它仍然是纯文本格式,但以逗号分隔。

Example: 例:

EmpID,ProjectID,DateFrom,DateTo
1,A,2014-11-01,2015-05-01
2,B,2013-12-06,2014-10-06
2,C,2014-01-07,2016-03-07
3,B,2015-06-04,2017-09-04
5,C,2014-10-01,2015-12-01
1,A,2013-03-07,2015-11-07
2,C,2015-07-09,2019-01-19
3,B,2013-11-13,2014-03-13
4,C,2016-02-14,NULL
5,D,2014-03-15,2015-11-09

Now, I learned how to read .CSV files, but I am not sure on what is the best way for the thing after (the comparing of values, etc). 现在,我学习了如何读取.CSV文件,但是我不确定什么是最好的处理方式(值的比较等)。 For now, I decided that this is the cleanest option: 现在,我认为这是最干净的选择:

import csv

with open('data.csv', 'r') as f:
  reader = csv.reader(f)
  your_list = list(reader)

print(your_list)

I just want a piece of advice, if the best way to go would be with comparing indexes of the list. 如果最好的方法是比较列表的索引,我只想提一条建议。 I was also thinking about dictionaries, but I am not sure, hence the reason I am asking here :) And SQL is not an option, even though it would be so easy with it. 我也在考虑字典,但是我不确定,因此我在这里问的原因是:)而且SQL不是一个选择,即使使用SQL很容易。 Sorry if this is a bad question, but I am currently learning Python and this is quite an important task for me. 不好意思,如果这是一个不好的问题,但是我目前正在学习Python,这对我来说是很重要的任务。 Thanks! 谢谢!

As I understand from what you wrote, I think what you need is something like this: 从您写的内容中了解到,我认为您需要的是这样的东西:

#read csv, and split on "," the line
csv_file = csv.reader(open('data.csv', "rb"), delimiter=",")

for item in csv_file:
#do your work

maybe you can look at Pandas too if you have large Data. 如果您有大量数据,也许您也可以看看Pandas。 It ll be more efficient to work with Pandas in that Case 在这种情况下,与熊猫合作会更有效率

You can use datetime package to check total time elapsed. 您可以使用datetime包来检查经过的总时间。 Create a list of people in the csv file, then sort the list based on the elapsed time. 在csv文件中创建人员列表,然后根据经过时间对列表进行排序。 for the first 8 rows of csv file (because NULL is undefined!): 对于csv文件的前8行(因为NULL未定义!):

1,A,2014-11-01,2015-05-01
2,B,2013-12-06,2014-10-06
2,C,2014-01-07,2016-03-07
3,B,2015-06-04,2017-09-04
5,C,2014-10-01,2015-12-01
1,A,2013-03-07,2015-11-07

You can use this: 您可以使用此:

from datetime import datetime
with open('file.txt', 'r') as file:
    my_list = list()
    for line in file:      
        list_ = line.split(',')
        dt1 = datetime.strptime(list_[2], '%Y-%M-%d')
        dt2 = datetime.strptime(list_[3][:10], '%Y-%M-%d')
        my_list.append(list_[:2] + [dt2-dt1])
        my_list.sort(key=lambda x: x[2])
print(my_list)

output: 输出:

[['3', 'B', datetime.timedelta(days=364, seconds=85920)], ['1', 'A', datetime.timedelta(days=364, seconds=86040)], ['2', 'B', datetime.timedelta(days=364, seconds=86280)], ['5', 'C', datetime.timedelta(days=365, seconds=120)], ['2', 'C', datetime.timedelta(days=730, seconds=120)], ['1', 'A', datetime.timedelta(days=730, seconds=480)], ['3', 'B', datetime.timedelta(days=731, seconds=180)], ['2', 'C', datetime.timedelta(days=1470, seconds=86040)]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM