如何通过在 Pandas 时间序列数据框中搜索数据来添加新列

Question

I have a Pandas time series dataframe.我有一个 Pandas 时间序列数据框。 It has minute data for a stock for 30 days.它有一只股票 30 天的分钟数据。 I want to create a new column, stating the price of the stock at 6 AM for that day, eg for all lines for January 1, I want a new column with the price at noon on January 1, and for all lines for January 2, I want a new column with the price at noon on January 2, etc.我想创建一个新列，说明当天早上 6 点的股票价格，例如，对于 1 月 1 日的所有行，我想要一个新列，其中包含 1 月 1 日中午的价格，以及 1 月 2 日的所有行，我想要一个新的专栏，1月2日中午的价格，等等。

Existing timeframe:
Date   Time   Last_Price   Date   Time   12amT
1/1/19 08:00  100          1/1/19 08:00  ?
1/1/19 08:01  101          1/1/19 08:01  ?
1/1/19 08:02  100.50       1/1/19 08:02  ?
...
31/1/19 21:00 106         31/1/19 21:00  ?

I used this hack, but it is very slow, and I assume there is a quicker and easier way to do this.我使用了这个 hack，但它很慢，我认为有一种更快更简单的方法来做到这一点。

for lab, row in df.iterrows() :
    t=row["Date"]
    df.loc[lab,"12amT"]=df[(df['Date']==t)&(df['Time']=="12:00")]["Last_Price"].values[0]

Answer 1

One way to do this is to use groupby with pd.Grouper:一种方法是将 groupby 与 pd.Grouper 一起使用：

For pandas 24.1+对于熊猫 24.1+

df.groupby(pd.Grouper(freq='D'))[0]\
  .transform(lambda x: x.loc[(x.index.hour == 12) & 
                             (x.index.minute==0)].to_numpy()[0])

Older pandas use:年长的熊猫使用：

 df.groupby(pd.Grouper(freq='D'))[0]\
   .transform(lambda x: x.loc[(x.index.hour == 12) &
                              (x.index.minute==0)].values[0])

MVCE: MVCE：

df = pd.DataFrame(np.arange(48*60), index=pd.date_range('02-01-2019',periods=(48*60), freq='T'))

df['12amT'] = df.groupby(pd.Grouper(freq='D'))[0].transform(lambda x: x.loc[(x.index.hour == 12)&(x.index.minute==0)].to_numpy()[0])

Output (head):输出（头）：

                    0  12amT
2019-02-01 00:00:00  0    720
2019-02-01 00:01:00  1    720
2019-02-01 00:02:00  2    720
2019-02-01 00:03:00  3    720
2019-02-01 00:04:00  4    720

Answer 2

I'm not sure why you have two DateTime columns, I made my own example to demonstrate:我不确定为什么你有两个 DateTime 列，我做了我自己的例子来演示：

ind = pd.date_range('1/1/2019', '30/1/2019', freq='H')
df = pd.DataFrame({'Last_Price':np.random.random(len(ind)) + 100}, index=ind)

def noon_price(df):
    noon_price = df.loc[df.index.hour == 12, 'Last_Price'].values
    noon_price = noon_price[0] if len(noon_price) > 0 else np.nan
    df['noon_price'] = noon_price
    return df

df.groupby(df.index.day).apply(noon_price).reindex(ind)

reindex by default will fill each day's rows with its noon_price .默认情况下， reindex将用noon_price填充每一天的行。

To add a column with the next day 's noon price, you can shift the column 24 rows down, like this:要添加包含第二天中午价格的列，您可以shift列向下shift 24 行，如下所示：

df['T+1'] = df.noon_price.shift(-24)

如何通过在 Pandas 时间序列数据框中搜索数据来添加新列

问题描述

2 个解决方案

解决方案1
1 2019-02-14 14:49:45

解决方案2
0 2019-02-14 14:56:07

如何通过在 Pandas 时间序列数据框中搜索数据来添加新列

问题描述

2 个解决方案

解决方案1 1 2019-02-14 14:49:45

解决方案2 0 2019-02-14 14:56:07

解决方案1
1 2019-02-14 14:49:45

解决方案2
0 2019-02-14 14:56:07