[英]Extracting data belonging to a day from a given range of dates on a dataset
I have a data set with a date range from January 12th to August 3rd of 2018 with some values: 我的数据集的日期范围是2018年1月12日到8月3日,其中包含一些值:
The dimensionality of my_df
DataFrame is: my_df
DataFrame的维度是:
my_df.shape
(9752, 2)
Each row contains half hour frequency 每行包含半小时的频率
The first row begins at 2018-01-12
第一行开始于2018-01-12
my_df.iloc[0]
Date: 2018-01-12 00:17:28
Value 1
Name: 0, dtype: object
And the last row ending at 2018-08-03
最后一排结束于2018-08-03
my_df.tail(1)
Date: Value
9751 2018-08-03 23:44:59 1
My goal is to select the data rows corresponding to each day and export it to a CSV file. 我的目标是选择与每天相对应的数据行并将其导出为CSV文件。
To get only the January 12th data and save to readable file, I perform: 为了获得1月12日的数据并保存到可读文件,我执行:
# Selecting data value of each day
my_df_Jan12 = my_df[(my_df['Fecha:']>='2018-01-12 00:00:00')
&
(my_df['Fecha:']<='2018-01-12 23:59:59')
]
my_df_Jan12.to_csv('Data_Jan_12.csv', sep=',', header=True, index=False)
From January 12 to August 03 there are 203 days (28 weeks) 从1月12日到8月03日有203天(28周)
I don't want to perform this query by each day manually, then I am trying the following basic analysis: 我不想每天手动执行此查询,然后我尝试以下基本分析:
Then: 然后:
According to the above, I am trying this approach: 根据以上所述,我正在尝试这种方法:
# Selecting data value of each day (203 days)
for i in range(203):
for j in range(1,9): # month
for k in range(12,32): # days of the month
values = my_df[(my_df['Fecha:']>='2018-0{}-{} 00:00:00'.format(j,k))
&
(my_df['Fecha:']<='2018-0{}-{} 23:59:59'.format(j,k))]
values.to_csv('Values_day_{}.csv'.format(i), sep=',', header=True, index=False)
But I have the problem in the sense of when I iterate of range(12,32)
in the days of the months, this range(12,32)
only apply to first January month, I think so ... 但是我的问题在于我在几个月range(12,32)
迭代range(12,32)
,这个range(12,32)
仅适用于1月份的第一个月,我想是这样......
Finally, I get 203 empty CSV files, due to something I am doing wrong ... 最后,我得到203个空的CSV文件,因为我做错了...
How to can I address this small challenge of the suited way? 如何才能解决这种适合的小挑战? Any orientation is highly appreciated 任何方向都非常感谢
Something like this? 像这样的东西? I renamed your original column of Date:
to Timestamp
. 我将您的原始列Date:
重命名为Timestamp
。 I am also assuming that the Date:
Series you have is a pandas DateTime
series. 我也假设您拥有的Date:
系列是熊猫DateTime
系列。
my_df.columns = ['Timestamp', 'Value']
my_df['Date'] = my_df['Timestamp'].apply(lambda x: x.date())
dates = my_df['Date'].unique()
for date in dates:
f_name = str(date) + '.csv'
my_df[my_df['Date'] == date].to_csv(f_name)
groupby
for date, d in df.groupby(pd.Grouper(key='Date', freq='D')):
d.to_csv(f"Data_{date:%b_%d}.csv", index=False)
Notice I used an f-string which is Python 3.6+ 注意我使用的是一个Python字符串3.6+的f字符串
Otherwise, use this 否则,请使用此功能
for date, d in df.groupby(pd.Grouper(key='Date', freq='D')):
d.to_csv("Data_{:%b_%d}.csv".format(date), index=False)
Consider the df
考虑一下df
df = pd.DataFrame(dict(
Date=pd.date_range('2010-01-01', periods=10, freq='12H'),
Value=range(10)
))
Then 然后
for date, d in df.groupby(pd.Grouper(key='Date', freq='D')):
d.to_csv(f"Data_{date:%b_%d}.csv", index=False)
And verify 并验证
from pathlib import Path
print(*map(Path.read_text, Path('.').glob('Data*.csv')), sep='\n')
Date,Value
2010-01-05 00:00:00,8
2010-01-05 12:00:00,9
Date,Value
2010-01-04 00:00:00,6
2010-01-04 12:00:00,7
Date,Value
2010-01-02 00:00:00,2
2010-01-02 12:00:00,3
Date,Value
2010-01-01 00:00:00,0
2010-01-01 12:00:00,1
Date,Value
2010-01-03 00:00:00,4
2010-01-03 12:00:00,5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.