简体   繁体   中英

How to use join method in pandas dataframe

I have this line of code which grabs the last value of the previous day and adds it repeated to the next day in a new column. Works fine.

df = df.join(df.resample('B', on='Date')['x'].last().rename('xnew'), on=pd.to_datetime((df['Date'] - pd.tseries.offsets.BusinessDay()).dt.date))

Now I need something similar but I can't get it working.
I need now the first value of the day in 'Open' and copy this value into every row in new column 'opening', for each day
I tried this but it doesn't work:

df = df.join(df.resample('B', on='Date')['Open'].last().rename('opening'), on=pd.to_datetime((df['Date'])))

error:

ValueError: columns overlap but no suffix specified: Index(['opening'], dtype='object')

How could I accomplish this?

With:

opening = df.resample('B', on='Date')['Open'].first()

Date
2019-06-20    2927.25
2019-06-21    2932.75
2019-06-24    2942.00
2019-06-25    2925.00
2019-06-26    2902.75
               ...   
2020-06-17    3116.50
2020-06-18    3091.50
2020-06-19    3101.75
2020-06-22    3072.75
2020-06-23    3111.25

..I get the first values, and my desired output is

        Date                 Open       opening
1       2020-06-24 07:00:00  3091.50    3111.25  
2       2020-06-24 07:05:00  3092.50    3111.25
3       2020-06-24 07:10:00  3090.25    3111.25
4       2020-06-24 07:15:00  3089.75    3111.25

Here's some sample data. The days are now from 7:00h to 7:15h for this example:

           Time             Open
Date        
2019-06-20 07:00:00 70000   2927.25
2019-06-20 07:05:00 70500   2927.00
2019-06-20 07:10:00 71000   2927.00
2019-06-20 07:15:00 71500   2926.75
2019-06-21 07:00:00 70000   2932.75
2019-06-21 07:05:00 70500   2932.25
2019-06-21 07:10:00 71000   2933.00
2019-06-21 07:15:00 71500   2930.75
2019-06-24 07:00:00 70000   2942.00
2019-06-24 07:05:00 70500   2941.50
2019-06-24 07:10:00 71000   2942.00
2019-06-24 07:15:00 71500   2941.50
2019-06-25 07:00:00 70000   2925.00
2019-06-25 07:05:00 70500   2925.75
2019-06-25 07:10:00 71000   2926.50
2019-06-25 07:15:00 71500   2926.00
2019-06-26 07:00:00 70000   2902.75
2019-06-26 07:05:00 70500   2903.00
2019-06-26 07:10:00 71000   2904.00
2019-06-26 07:15:00 71500   2904.25

I started with a similar approach as yours by using resample . The thing I added, is shifting all the values so that each will have the next day as Index. Then I can feed this values to Series.map applied on the date.

Here is the code:

df['opening'] = df.Date.dt.date.map(df.resample('B', on='Date').Open.first().shift())
    Date                Open    opening
0   2019-06-20 07:00:00 2927.25 
1   2019-06-20 07:05:00 2927.0  
2   2019-06-20 07:10:00 2927.0  
3   2019-06-20 07:15:00 2926.75 
4   2019-06-21 07:00:00 2932.75 2927.25
5   2019-06-21 07:05:00 2932.25 2927.25
6   2019-06-21 07:10:00 2933.0  2927.25
7   2019-06-21 07:15:00 2930.75 2927.25
8   2019-06-24 07:00:00 2942.0  2932.75
9   2019-06-24 07:05:00 2941.5  2932.75
10  2019-06-24 07:10:00 2942.0  2932.75
11  2019-06-24 07:15:00 2941.5  2932.75
12  2019-06-25 07:00:00 2925.0  2942.0

Of course, for the first day will have NaN.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM