简体   繁体   English

从数据集上的给定日期范围中提取属于某一天的数据

[英]Extracting data belonging to a day from a given range of dates on a dataset

I have a data set with a date range from January 12th to August 3rd of 2018 with some values: 我的数据集的日期范围是2018年1月12日到8月3日,其中包含一些值:

在此输入图像描述

The dimensionality of my_df DataFrame is: my_df DataFrame的维度是:

my_df.shape 
(9752, 2)

Each row contains half hour frequency 每行包含半小时的频率

The first row begins at 2018-01-12 第一行开始于2018-01-12

my_df.iloc[0]
Date:       2018-01-12 00:17:28
Value                      1
Name: 0, dtype: object

And the last row ending at 2018-08-03 最后一排结束于2018-08-03

my_df.tail(1)
                  Date:     Value
9751    2018-08-03 23:44:59  1

My goal is to select the data rows corresponding to each day and export it to a CSV file. 我的目标是选择与每天相对应的数据行并将其导出为CSV文件。

To get only the January 12th data and save to readable file, I perform: 为了获得1月12日的数据并保存到可读文件,我执行:

# Selecting data value of each day
my_df_Jan12 = my_df[(my_df['Fecha:']>='2018-01-12 00:00:00') 
              & 
              (my_df['Fecha:']<='2018-01-12 23:59:59')
                                   ]
my_df_Jan12.to_csv('Data_Jan_12.csv', sep=',', header=True, index=False)

From January 12 to August 03 there are 203 days (28 weeks) 从1月12日到8月03日有203天(28周)

I don't want to perform this query by each day manually, then I am trying the following basic analysis: 我不想每天手动执行此查询,然后我尝试以下基本分析:

  • I need to generate 203 files (1 file by each day) 我需要生成203个文件(每天1个文件)
  • The day on January starting on 12 (January 12) 1月12日(1月12日)的一天
  • January is the first month (01) and August is the eighth month(08) 1月是第一个月(01),8月是第8个月(08)

Then: 然后:

  • I need to iterate over the 203 days totality 我需要迭代整天203天
    • and is necessary in each date row value check the month and day value date with the order to check the change of each one of them 并且必须在每个日期行值中检查月份和日期值日期以及检查每个日期值的变化

According to the above, I am trying this approach: 根据以上所述,我正在尝试这种方法:

# Selecting data value of each day (203 days)
for i in range(203):
    for j in range(1,9): # month
        for k in range(12,32): # days of the month
            values = my_df[(my_df['Fecha:']>='2018-0{}-{} 00:00:00'.format(j,k)) 
            &  
            (my_df['Fecha:']<='2018-0{}-{} 23:59:59'.format(j,k))]
            values.to_csv('Values_day_{}.csv'.format(i), sep=',', header=True, index=False)

But I have the problem in the sense of when I iterate of range(12,32) in the days of the months, this range(12,32) only apply to first January month, I think so ... 但是我的问题在于我在几个月range(12,32)迭代range(12,32) ,这个range(12,32)仅适用于1月份的第一个月,我想是这样......

Finally, I get 203 empty CSV files, due to something I am doing wrong ... 最后,我得到203个空的CSV文件,因为我做错了...

How to can I address this small challenge of the suited way? 如何才能解决这种适合的小挑战? Any orientation is highly appreciated 任何方向都非常感谢

Something like this? 像这样的东西? I renamed your original column of Date: to Timestamp . 我将您的原始列Date:重命名为Timestamp I am also assuming that the Date: Series you have is a pandas DateTime series. 我也假设您拥有的Date:系列是熊猫DateTime系列。

my_df.columns = ['Timestamp', 'Value']
my_df['Date'] = my_df['Timestamp'].apply(lambda x: x.date())
dates = my_df['Date'].unique()
for date in dates:
    f_name = str(date) + '.csv'
    my_df[my_df['Date'] == date].to_csv(f_name)

groupby

for date, d in df.groupby(pd.Grouper(key='Date', freq='D')):
  d.to_csv(f"Data_{date:%b_%d}.csv", index=False)

Notice I used an f-string which is Python 3.6+ 注意我使用的是一个Python字符串3.6+的f字符串
Otherwise, use this 否则,请使用此功能

for date, d in df.groupby(pd.Grouper(key='Date', freq='D')):
  d.to_csv("Data_{:%b_%d}.csv".format(date), index=False)

Consider the df 考虑一下df

df = pd.DataFrame(dict(
    Date=pd.date_range('2010-01-01', periods=10, freq='12H'),
    Value=range(10)
))

Then 然后

for date, d in df.groupby(pd.Grouper(key='Date', freq='D')):
  d.to_csv(f"Data_{date:%b_%d}.csv", index=False)

And verify 并验证

from pathlib import Path

print(*map(Path.read_text, Path('.').glob('Data*.csv')), sep='\n')

Date,Value
2010-01-05 00:00:00,8
2010-01-05 12:00:00,9

Date,Value
2010-01-04 00:00:00,6
2010-01-04 12:00:00,7

Date,Value
2010-01-02 00:00:00,2
2010-01-02 12:00:00,3

Date,Value
2010-01-01 00:00:00,0
2010-01-01 12:00:00,1

Date,Value
2010-01-03 00:00:00,4
2010-01-03 12:00:00,5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM