[英]Get range of dates between specified start and end date from csv using python
I have a problem in which i have a CSV file with StartDate and EndDate, Consider 01-02-2020 00:00:00 and 01-03-2020 00:00:00我有一个问题,我有一个带有 StartDate 和 EndDate 的 CSV 文件,考虑 01-02-2020 00:00:00 和 01-03-2020 00:00:00
And I want a python program that finds the dates in between the dates and append in next rows like我想要一个 python 程序,它可以找到日期之间的日期和下一行中的 append,例如
So here instead of dot, it should increment Startdate and keep End date as it is.所以在这里而不是点,它应该增加开始日期并保持结束日期不变。
import pandas as pd
df = pd.read_csv('MyData.csv')
df['StartDate'] = pd.to_datetime(df['StartDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
df['Dates'] = [pd.date_range(x, y) for x , y in zip(df['StartDate'],df['EndDate'])]
df = df.explode('Dates')
df
So for example, if i have StartDate as 01-02-2020 00:00:00 and EndDate as 05-02-2020 00:00:00例如,如果我的 StartDate 为 01-02-2020 00:00:00 和 EndDate 为 05-02-2020 00:00:00
As result i should get结果我应该得到
All the result DateTime should be in same format as in MyData.Csv StartDate and EndDate所有结果 DateTime 的格式应与 MyData.Csv StartDate 和 EndDate 中的格式相同
Only the StartDate will change, rest should be same只有 StartDate 会改变,rest 应该相同
I tried doing it with date range.我试着用日期范围来做。 But am not getting any result.
但我没有得到任何结果。 Can anyone please help me with this.
谁能帮我解决这个问题。
Thanks谢谢
My two cents: a very simple solution based only on functions from pandas
:我的两分钱:一个非常简单的解决方案,仅基于
pandas
的功能:
import pandas as pd
# Format of the dates in 'MyData.csv'
DT_FMT = '%m-%d-%Y %H:%M:%S'
df = pd.read_csv('MyData.csv')
# Parse dates with the provided format
for c in ('StartDate', 'EndDate'):
df[c] = pd.to_datetime(df[c], format=DT_FMT)
# Create the DataFrame with the ranges of dates
date_df = pd.DataFrame(
data=[[d] + list(row[1:])
for row in df.itertuples(index=False, name=None)
for d in pd.date_range(row[0], row[1])],
columns=df.columns.copy()
)
# Convert dates to strings in the same format of 'MyData.csv'
for c in ('StartDate', 'EndDate'):
date_df[c] = date_df[c].dt.strftime(DT_FMT)
If df
is:如果
df
是:
StartDate EndDate A B C
0 2020-01-02 2020-01-06 ME ME ME
1 2021-05-15 2021-05-18 KI KI KI
then date_df
will be:那么
date_df
将是:
StartDate EndDate A B C
0 01-02-2020 00:00:00 01-06-2020 00:00:00 ME ME ME
1 01-03-2020 00:00:00 01-06-2020 00:00:00 ME ME ME
2 01-04-2020 00:00:00 01-06-2020 00:00:00 ME ME ME
3 01-05-2020 00:00:00 01-06-2020 00:00:00 ME ME ME
4 01-06-2020 00:00:00 01-06-2020 00:00:00 ME ME ME
5 05-15-2021 00:00:00 05-18-2021 00:00:00 KI KI KI
6 05-16-2021 00:00:00 05-18-2021 00:00:00 KI KI KI
7 05-17-2021 00:00:00 05-18-2021 00:00:00 KI KI KI
8 05-18-2021 00:00:00 05-18-2021 00:00:00 KI KI KI
Then you can save back the result to a CSV file with the to_csv
method.然后,您可以使用
to_csv
方法将结果保存回 CSV 文件。
Does something like this achieve what you want?这样的事情能达到你想要的吗?
from datetime import datetime, timedelta
date_list = []
for base, end in zip(df['StartDate'], df['EndDate']):
d1 = datetime.strptime(base, "%d-%m-%Y %H:%M:%S")
d2 = datetime.strptime(end, "%d-%m-%Y %H:%M:%S")
numdays = abs((d2 - d1).days)
basedate = datetime.strptime(base, "%d-%m-%Y %H:%M:%S")
date_list += [basedate - timedelta(days=x) for x in range(numdays)]
df['Dates'] = date_list
Actually the code you provided is working for me.实际上,您提供的代码对我有用。 I guess the only thing you need to change is the date formatting in reading and writing operations to make sure that is consistent with your requirements.
我想您唯一需要更改的是读写操作中的日期格式,以确保符合您的要求。 In particular, you should leverage the
dayfirst
argument when reading and date_format
when writing the output file.特别是,您应该在读取时利用
dayfirst
参数,在写入 output 文件时利用date_format
。 A toy example below:下面是一个玩具示例:
Toy data玩具数据
StartDate![]() |
EndDate![]() |
A![]() |
B![]() |
C ![]() |
---|---|---|---|---|
01-02-2020 00:00:00 ![]() |
06-02-2020 00:00:00 ![]() |
ME![]() |
ME![]() |
ME![]() |
01-04-2020 00:00:00 ![]() |
04-04-2020 00:00:00 ![]() |
PE![]() |
PE![]() |
PE![]() |
Sample code示例代码
import pandas as pd
s_dates = ['01-02-2020', '01-03-2020']
e_dates = ['01-04-2020', '01-05-2020']
df = pd.read_csv('dataSO.csv', parse_dates=[0,1], dayfirst=True)
cols = df.columns
df['Dates'] = [pd.date_range(x, y) for x , y in zip(df['StartDate'],df['EndDate'])]
df1 = df.explode('Dates')[cols]
df1.to_csv('resSO.csv', date_format="%d-%m-%Y %H:%M:%S", index=False)
And the output is what you described except for the fact that StartDate
is also in datetime format. output 是您所描述的,除了
StartDate
也是日期时间格式。 Does this answer you question?这能回答你的问题吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.