[英]How can a DataFrame change from having two columns (a “from” datetime and a “to” datetime) to having a single column for a date?
I've got a DataFrame that looks like this: 我有一个看起来像这样的DataFrame:
It has two columns, one of them being a "from" datetime and one of them being a "to" datetime. 它有两列,其中一列是“起始”日期时间,其中一列是“至”日期时间。 I would like to change this DataFrame such that it has a single column or index for the date (eg 2015-07-06 00:00:00 in datetime form) with the variables of the other columns (like deep
) split proportionately into each of the days. 我想更改此DataFrame,以使其具有用于日期的单个列或索引(例如,datetime格式的2015-07-06 00:00:00),而其他列(如deep
)的变量按比例分成每个的日子。 How might one approach this problem? 一个人如何解决这个问题? I've meddled with groupby
tricks and I'm not sure how to proceed. 我已经干预了groupby
技巧,但不确定如何继续。
So I don't have time to work through your specific problem at the moment. 因此,我目前没有时间解决您的特定问题。 But the way to approach this is to us pandas.resample() . 但是解决这个问题的方法是给我们pandas.resample() 。 Here are the steps I would take. 这是我要采取的步骤。 1) Resample your to date column by minute. 1)按分钟重新采样您的日期列。 2) Populate the other columns out over that resample. 2)在该重采样中填充其他列。 3) Add the date column back in as an index. 3)重新添加日期列作为索引。
If this doesn't work or is being tricky to work with I would create a date range from your earliest date to your latest date (at the smallest interval you want - so maybe hourly?) and then run some conditional statements over your other columns to fill in the data. 如果这行不通或很难处理,我将创建一个日期范围,从最早的日期到最新的日期(以您想要的最小间隔-也许是每小时?),然后在其他列上运行一些条件语句填写数据。
Here is somewhat what your code may look like for the resample portion (replace day with hour or whatever): 这是您的代码在重采样部分中可能看起来的样子(用小时或其他时间替换日期):
drange = pd.date_range('01-01-1970', '01-20-2018', freq='D')
data = data.resample('D').fillna(method='ffill')
data.index.name = 'date'
Hope this helps! 希望这可以帮助!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.