简体   繁体   English

如何使用Python中的多列计算CSV文件中日期之间的平均时间?

[英]How to Calculate Average Time between Dates in CSV File with multiple column in Python?

I have a csv file with three different columns, namely Year, Month and Day. 我有一个包含三个不同列的csv文件,即Year,Month和Day。 If I printed it out, it will be something like below: 如果我把它打印出来,它将如下所示:

csv_reader = [['2016', '6', '22'], ['2016', '10', '2'], ['2016', '11', '1'], ['2016', '11', '3'], ['2016', '11', '3'], ['2016', '11', '17'], ['2016', '11', '17'], ['2016', '11', '17'], ['2016', '12', '2'], ['2016', '12', '12'], ['2016', '12', '22'], ['2016', '12', '22'], ['2017', '1', '11'], ['2017', '3', '11'], ['2017', '3', '11'], ['2017', '5', '12'], ['2017', '5', '12'], ['2017', '5', '12']]

So both the csv file and the row are lists. 所以csv文件和行都是列表。 I want to calculate the average days between each date from this dataset. 我想计算这个数据集中每个日期之间的平均天数 I try to use: 我尝试使用:

from datetime import date
for value in csv_reader:
    dates = date(int(value))
    differences = [(dates[i]-dates[i-1]).days for i in range(1, len(dates))]
print(float(sum(differences))/len(differences))

But it returned an error stating that 但是它返回了一个错误,说明了这一点

TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'

and I suspect if my code is actually correct. 我怀疑我的代码是否真的正确无误。 The expected result should calculate the sum of difference between each dates, and then divided by the total differences, which would return the average days. 预期结果应计算每个日期之间的差异总和,然后除以总差异,这将返回平均天数。 So, could any of you give advice on how to perform this task? 那么,您是否可以就如何执行此任务提出建议?

So we have 所以我们有

csv_reader = [['2016', '6', '22'], ['2016', '10', '2'], ['2016', '11', '1'], ['2016', '11', '3'], ['2016', '11', '3'], ['2016', '11', '17'], ['2016', '11', '17'], ['2016', '11', '17'], ['2016', '12', '2'], ['2016', '12', '12'], ['2016', '12', '22'], ['2016', '12', '22'], ['2017', '1', '11'], ['2017', '3', '11'], ['2017', '3', '11'], ['2017', '5', '12'], ['2017', '5', '12'], ['2017', '5', '12']]

first, to get a valid date from lets say, the first item, you need to covert the str to int, and only then call date : 首先,要从第一个项目中获取有效日期,您需要将str转换为int,然后才调用date

date(*[int(d) for d in csv_reader[0]])

And you need to have a date instance for each 2 dates and subtract them: 并且您需要为每两个日期设置一个date实例并减去它们:

date(*[int(d) for d in csv_reader[0]]) - date(*[int(d) for d in csv_reader[1]])

Afterwards, you get a timedelta object, which has .days . 之后,你得到一个timedelta对象,它有.days Sometimes it'll be negative, so you'll need to use the absolute value, with abs . 有时候它会是负数,所以你需要使用abs

After you sum all those days, you need to do the average, relative to the number of elements. 总结所有这些天之后,您需要相对于元素数量进行平均值。

All in all, this is the loop you need: 总而言之,这是您需要的循环:

total = 0
for i in range(len(csv_reader)-1):  
    total += abs((date(*[int(d) for d in csv_reader[i]]) - date(*[int(d) for d in csv_reader[i+1]])).days)

>>> total
324
>>> total / len(csv_reader)
18
  1. Convert list of str to the list of dates 将str列表转换为日期列表

     dates = [datetime.date(*[int(d) for d in ds]) for ds in sortedtime] 
  2. Zip two lists of dates with shift for 1 day and calculate timedelta of days: 压缩两个日期列表,其中班次为1天,并计算时间天数:

     delta_days = [(d_t[0] - d_t[1]).days for d_t in list(zip(dates[1:],dates))] 
  3. The Average will be a simple operation 平均值将是一个简单的操作

     avg_days = sum(delta_days)/len(delta_days) 

You can't just convert this string '2016,6,22' to int. 你不能只是将这个字符串'2016,6,22'转换为int。 You should delete commas before you pass a string into the int() function 在将字符串传递给int()函数之前,应删除逗号

尝试将date(int(value))更改为date(int(value.replace(',','')))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM