简体   繁体   English

使用 python 从 csv 获取指定开始和结束日期之间的日期范围

[英]Get range of dates between specified start and end date from csv using python

I have a problem in which i have a CSV file with StartDate and EndDate, Consider 01-02-2020 00:00:00 and 01-03-2020 00:00:00我有一个问题,我有一个带有 StartDate 和 EndDate 的 CSV 文件,考虑 01-02-2020 00:00:00 和 01-03-2020 00:00:00

And I want a python program that finds the dates in between the dates and append in next rows like我想要一个 python 程序,它可以找到日期之间的日期和下一行中的 append,例如

CSV 文件

So here instead of dot, it should increment Startdate and keep End date as it is.所以在这里而不是点,它应该增加开始日期并保持结束日期不变。

import pandas as pd

df = pd.read_csv('MyData.csv')

df['StartDate'] = pd.to_datetime(df['StartDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
df['Dates'] = [pd.date_range(x, y) for x , y in zip(df['StartDate'],df['EndDate'])]
df = df.explode('Dates')
df

So for example, if i have StartDate as 01-02-2020 00:00:00 and EndDate as 05-02-2020 00:00:00例如,如果我的 StartDate 为 01-02-2020 00:00:00 和 EndDate 为 05-02-2020 00:00:00

As result i should get结果我应该得到

结果

All the result DateTime should be in same format as in MyData.Csv StartDate and EndDate所有结果 DateTime 的格式应与 MyData.Csv StartDate 和 EndDate 中的格式相同

Only the StartDate will change, rest should be same只有 StartDate 会改变,rest 应该相同

I tried doing it with date range.我试着用日期范围来做。 But am not getting any result.但我没有得到任何结果。 Can anyone please help me with this.谁能帮我解决这个问题。

Thanks谢谢

My two cents: a very simple solution based only on functions from pandas :我的两分钱:一个非常简单的解决方案,仅基于pandas的功能:

import pandas as pd

# Format of the dates in 'MyData.csv'
DT_FMT = '%m-%d-%Y %H:%M:%S'

df = pd.read_csv('MyData.csv')

# Parse dates with the provided format
for c in ('StartDate', 'EndDate'):
    df[c] = pd.to_datetime(df[c], format=DT_FMT)

# Create the DataFrame with the ranges of dates
date_df = pd.DataFrame(
    data=[[d] + list(row[1:])
          for row in df.itertuples(index=False, name=None)
          for d in pd.date_range(row[0], row[1])],
    columns=df.columns.copy()
)

# Convert dates to strings in the same format of 'MyData.csv'
for c in ('StartDate', 'EndDate'):
    date_df[c] = date_df[c].dt.strftime(DT_FMT)

If df is:如果df是:

   StartDate    EndDate   A   B   C
0 2020-01-02 2020-01-06  ME  ME  ME
1 2021-05-15 2021-05-18  KI  KI  KI

then date_df will be:那么date_df将是:

             StartDate              EndDate   A   B   C
0  01-02-2020 00:00:00  01-06-2020 00:00:00  ME  ME  ME
1  01-03-2020 00:00:00  01-06-2020 00:00:00  ME  ME  ME
2  01-04-2020 00:00:00  01-06-2020 00:00:00  ME  ME  ME
3  01-05-2020 00:00:00  01-06-2020 00:00:00  ME  ME  ME
4  01-06-2020 00:00:00  01-06-2020 00:00:00  ME  ME  ME
5  05-15-2021 00:00:00  05-18-2021 00:00:00  KI  KI  KI
6  05-16-2021 00:00:00  05-18-2021 00:00:00  KI  KI  KI
7  05-17-2021 00:00:00  05-18-2021 00:00:00  KI  KI  KI
8  05-18-2021 00:00:00  05-18-2021 00:00:00  KI  KI  KI

Then you can save back the result to a CSV file with the to_csv method.然后,您可以使用to_csv方法将结果保存回 CSV 文件。

Does something like this achieve what you want?这样的事情能达到你想要的吗?

from datetime import datetime, timedelta

date_list = []
for base, end in zip(df['StartDate'], df['EndDate']):
    d1 = datetime.strptime(base, "%d-%m-%Y %H:%M:%S")
    d2 = datetime.strptime(end, "%d-%m-%Y %H:%M:%S")
    numdays = abs((d2 - d1).days)
    basedate = datetime.strptime(base, "%d-%m-%Y %H:%M:%S")
    date_list += [basedate - timedelta(days=x) for x in range(numdays)]

df['Dates'] = date_list

Actually the code you provided is working for me.实际上,您提供的代码对我有用。 I guess the only thing you need to change is the date formatting in reading and writing operations to make sure that is consistent with your requirements.我想您唯一需要更改的是读写操作中的日期格式,以确保符合您的要求。 In particular, you should leverage the dayfirst argument when reading and date_format when writing the output file.特别是,您应该在读取时利用dayfirst参数,在写入 output 文件时利用date_format A toy example below:下面是一个玩具示例:

Toy data玩具数据

StartDate开始日期 EndDate结束日期 A一个 B C C
01-02-2020 00:00:00 01-02-2020 00:00:00 06-02-2020 00:00:00 06-02-2020 00:00:00 ME ME ME
01-04-2020 00:00:00 01-04-2020 00:00:00 04-04-2020 00:00:00 04-04-2020 00:00:00 PE体育 PE体育 PE体育

Sample code示例代码

import pandas as pd
s_dates = ['01-02-2020', '01-03-2020']
e_dates = ['01-04-2020', '01-05-2020']

df = pd.read_csv('dataSO.csv', parse_dates=[0,1], dayfirst=True)
cols = df.columns

df['Dates'] = [pd.date_range(x, y) for x , y in zip(df['StartDate'],df['EndDate'])]
df1 = df.explode('Dates')[cols]
df1.to_csv('resSO.csv', date_format="%d-%m-%Y %H:%M:%S", index=False)

And the output is what you described except for the fact that StartDate is also in datetime format. output 是您所描述的,除了StartDate也是日期时间格式。 Does this answer you question?这能回答你的问题吗?

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python。 如何从日期列表中获取开始和结束日期 - Python. How to get start and end date from list of dates 如何使用python查找两个日期之间的日期(开始日期和结束日期除外) - How to find dates between two dates excluding start and end date using python 如何从 csv 获取日期列表(作为字符串)并仅返回开始日期和结束日期之间的日期/数据? - How can I take list of Dates from csv (as strings) and return only the dates/data between a start date and end date? 从具有开始和结束日期的 dataframe 列生成日期范围 - Generating date range from dataframe columns with start and end dates Python日期时间范围介于start_date和end_date之间 - Python datetime range between start_date and end_date 获取开始日期和结束日期 pandas 列之间的所有日期 - Get all dates between start and end date pandas columns python 从日期范围中获取季度日期 - python get the quarterly dates from a date range 如何从csv中获取具有开始和结束时间的两个日期范围之间的重叠? - How to get overlap between two date ranges that have a start and end time from a csv? 如何使用pandas.date_range()获取指定开始日期和结束日期之间n个指定句点(相等)的时间系列 - How can I use pandas.date_range() to obtain a time series with n specified periods (equal) between a specified start and end date Python Dataframe:获取指定日期范围/期间之间的行? - Python Dataframe: Get rows between a specified date range/Period?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM