使用 python 从 csv 获取指定开始和结束日期之间的日期范围

Question

I have a problem in which i have a CSV file with StartDate and EndDate, Consider 01-02-2020 00:00:00 and 01-03-2020 00:00:00我有一个问题，我有一个带有 StartDate 和 EndDate 的 CSV 文件，考虑 01-02-2020 00:00:00 和 01-03-2020 00:00:00

And I want a python program that finds the dates in between the dates and append in next rows like我想要一个 python 程序，它可以找到日期之间的日期和下一行中的 append，例如

So here instead of dot, it should increment Startdate and keep End date as it is.所以在这里而不是点，它应该增加开始日期并保持结束日期不变。

import pandas as pd

df = pd.read_csv('MyData.csv')

df['StartDate'] = pd.to_datetime(df['StartDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
df['Dates'] = [pd.date_range(x, y) for x , y in zip(df['StartDate'],df['EndDate'])]
df = df.explode('Dates')
df

So for example, if i have StartDate as 01-02-2020 00:00:00 and EndDate as 05-02-2020 00:00:00例如，如果我的 StartDate 为 01-02-2020 00:00:00 和 EndDate 为 05-02-2020 00:00:00

As result i should get结果我应该得到

All the result DateTime should be in same format as in MyData.Csv StartDate and EndDate所有结果 DateTime 的格式应与 MyData.Csv StartDate 和 EndDate 中的格式相同

Only the StartDate will change, rest should be same只有 StartDate 会改变，rest 应该相同

I tried doing it with date range.我试着用日期范围来做。 But am not getting any result.但我没有得到任何结果。 Can anyone please help me with this.谁能帮我解决这个问题。

Thanks谢谢

Answer 1

My two cents: a very simple solution based only on functions from pandas :我的两分钱：一个非常简单的解决方案，仅基于pandas的功能：

import pandas as pd

# Format of the dates in 'MyData.csv'
DT_FMT = '%m-%d-%Y %H:%M:%S'

df = pd.read_csv('MyData.csv')

# Parse dates with the provided format
for c in ('StartDate', 'EndDate'):
    df[c] = pd.to_datetime(df[c], format=DT_FMT)

# Create the DataFrame with the ranges of dates
date_df = pd.DataFrame(
    data=[[d] + list(row[1:])
          for row in df.itertuples(index=False, name=None)
          for d in pd.date_range(row[0], row[1])],
    columns=df.columns.copy()
)

# Convert dates to strings in the same format of 'MyData.csv'
for c in ('StartDate', 'EndDate'):
    date_df[c] = date_df[c].dt.strftime(DT_FMT)

If df is:如果df是：

   StartDate    EndDate   A   B   C
0 2020-01-02 2020-01-06  ME  ME  ME
1 2021-05-15 2021-05-18  KI  KI  KI

then date_df will be:那么date_df将是：

             StartDate              EndDate   A   B   C
0  01-02-2020 00:00:00  01-06-2020 00:00:00  ME  ME  ME
1  01-03-2020 00:00:00  01-06-2020 00:00:00  ME  ME  ME
2  01-04-2020 00:00:00  01-06-2020 00:00:00  ME  ME  ME
3  01-05-2020 00:00:00  01-06-2020 00:00:00  ME  ME  ME
4  01-06-2020 00:00:00  01-06-2020 00:00:00  ME  ME  ME
5  05-15-2021 00:00:00  05-18-2021 00:00:00  KI  KI  KI
6  05-16-2021 00:00:00  05-18-2021 00:00:00  KI  KI  KI
7  05-17-2021 00:00:00  05-18-2021 00:00:00  KI  KI  KI
8  05-18-2021 00:00:00  05-18-2021 00:00:00  KI  KI  KI

Then you can save back the result to a CSV file with the to_csv method.然后，您可以使用to_csv方法将结果保存回 CSV 文件。

Answer 2

Does something like this achieve what you want?这样的事情能达到你想要的吗？

from datetime import datetime, timedelta

date_list = []
for base, end in zip(df['StartDate'], df['EndDate']):
    d1 = datetime.strptime(base, "%d-%m-%Y %H:%M:%S")
    d2 = datetime.strptime(end, "%d-%m-%Y %H:%M:%S")
    numdays = abs((d2 - d1).days)
    basedate = datetime.strptime(base, "%d-%m-%Y %H:%M:%S")
    date_list += [basedate - timedelta(days=x) for x in range(numdays)]

df['Dates'] = date_list

Answer 3

Actually the code you provided is working for me.实际上，您提供的代码对我有用。 I guess the only thing you need to change is the date formatting in reading and writing operations to make sure that is consistent with your requirements.我想您唯一需要更改的是读写操作中的日期格式，以确保符合您的要求。 In particular, you should leverage the dayfirst argument when reading and date_format when writing the output file.特别是，您应该在读取时利用dayfirst参数，在写入 output 文件时利用date_format 。 A toy example below:下面是一个玩具示例：

Toy data玩具数据

StartDate开始日期	EndDate结束日期	A一个	B乙	C C
01-02-2020 00:00:00 01-02-2020 00:00:00	06-02-2020 00:00:00 06-02-2020 00:00:00	ME我	ME我	ME我
01-04-2020 00:00:00 01-04-2020 00:00:00	04-04-2020 00:00:00 04-04-2020 00:00:00	PE体育	PE体育	PE体育

Sample code示例代码

import pandas as pd
s_dates = ['01-02-2020', '01-03-2020']
e_dates = ['01-04-2020', '01-05-2020']

df = pd.read_csv('dataSO.csv', parse_dates=[0,1], dayfirst=True)
cols = df.columns

df['Dates'] = [pd.date_range(x, y) for x , y in zip(df['StartDate'],df['EndDate'])]
df1 = df.explode('Dates')[cols]
df1.to_csv('resSO.csv', date_format="%d-%m-%Y %H:%M:%S", index=False)

And the output is what you described except for the fact that StartDate is also in datetime format. output 是您所描述的，除了StartDate也是日期时间格式。 Does this answer you question?这能回答你的问题吗？

使用 python 从 csv 获取指定开始和结束日期之间的日期范围

问题描述

3 个解决方案

解决方案1
1 已采纳 2022-01-15 12:00:36

解决方案2
0 2022-01-15 10:45:32

解决方案3
0 2022-01-15 12:30:18

使用 python 从 csv 获取指定开始和结束日期之间的日期范围

问题描述

3 个解决方案

解决方案1 1 已采纳 2022-01-15 12:00:36

解决方案2 0 2022-01-15 10:45:32

解决方案3 0 2022-01-15 12:30:18

解决方案1
1 已采纳 2022-01-15 12:00:36

解决方案2
0 2022-01-15 10:45:32

解决方案3
0 2022-01-15 12:30:18