简体   繁体   中英

How do I create a new column with a set timeframe using Pandas datetime64

I'm trying to look at some sales data for a small store. I have a time stamp of when the settlement was made, but sometimes it's done before midnight and sometimes its done after midnight.

This is giving me data correct for some days and incorrect for others, as anything after midnight should be for the day before. I couldn't find the correct pandas documentation for what I'm looking for.

Is there an if else solution to create a new column, loop through the NEW_TIMESTAMP column and set a custom timeframe (if after midnight, but before 3pm: set the day before ; else set the day). Every time I write something it either runs forever, or it crashes jupyter.

Data: 数据如下所示:

What I did is I created another series which says when a day should be offset back by one day, and I multiplied it by a pd.timedelta object, such that 0 turns into "0 days" and 1 turns into "1 day". Subtracting two series gives the right result. Let me know how the following code works for you.

import pandas as pd
import numpy as np


# copied from https://stackoverflow.com/questions/50559078/generating-random-dates-within-a-given-range-in-pandas
def random_dates(start, end, n=15):
    start_u = start.value//10**9
    end_u = end.value//10**9
    return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')

dates = random_dates(start=pd.to_datetime('2020-01-01'),
                     end=pd.to_datetime('2021-01-01'))
timestamps = pd.Series(dates)

# this takes only the hour component of every datetime
hours = timestamps.dt.hour

# this takes only the hour component of every datetime
dates = timestamps.dt.date

# this compares the hours with 15, and returns a boolean if it is smaller
flag_is_day_before = hours < 15

# now you can set the dates by multiplying the 1s and 0s with a day timedelta
new_dates = dates - pd.to_timedelta(1, unit='day') * flag_is_day_before

df = pd.DataFrame(data=dict(timestamps=timestamps, new_dates=new_dates))

print(df)

This outputs

            timestamps   new_dates
0  2020-07-10 20:11:13  2020-07-10
1  2020-05-04 01:20:07  2020-05-03
2  2020-03-30 09:17:36  2020-03-29
3  2020-06-01 16:16:58  2020-06-01
4  2020-09-22 04:53:33  2020-09-21
5  2020-08-02 20:07:26  2020-08-02
6  2020-03-22 14:06:53  2020-03-21
7  2020-03-14 14:21:12  2020-03-13
8  2020-07-16 20:50:22  2020-07-16
9  2020-09-26 13:26:55  2020-09-25
10 2020-11-08 17:27:22  2020-11-08
11 2020-11-01 13:32:46  2020-10-31
12 2020-03-12 12:26:21  2020-03-11
13 2020-12-28 08:04:29  2020-12-27
14 2020-04-06 02:46:59  2020-04-05

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM