简体   繁体   中英

Updating a data frame using rows from another data frame

Thank you in advance if you help me out with this. What I am trying to accomplish is to update a data frame filled with zeroes with a datetime index (my trade dataframe) using another dataframe (indexed_orders) on the same dates. My code is as follows:

import pandas as pd
import numpy as np
import os
import csv


orders = pd.read_csv('./orders/orders.csv', parse_dates=True, sep=',', dayfirst=True) #initiate orders data frame from csv data file
indexed_orders = orders.set_index(['Date']) #set Date as index for orders
print indexed_orders

symbol_list = orders['Symbol'].tolist() #creates list of symbols
symbols = list(set(symbol_list)) #gets rid of duplicates in list


dates_list = orders['Date'].tolist() #creates list of order dates
dates_orders = list(set(dates_list)) #gets rid of duplicates in list


start_date = '2011-01-05' #establish date range
end_date = '2011-01-20'

dates = pd.date_range(start_date, end_date) #establish dates from start_date and end_date

trade = pd.DataFrame(0, index = dates, columns = symbols) #establish trade data frame
trade['Cash'] = 0 #add column for future calculations
print trade

Which outputs for indexed_orders:

Date         Symbol Order  Shares
2011-01-10   AAPL   BUY    1500
2011-01-13   AAPL  SELL    1500
2011-01-13    IBM   BUY    4000
2011-01-26   GOOG   BUY    1000
2011-02-02    XOM  SELL    4000
2011-02-10    XOM   BUY    4000
2011-03-03   GOOG  SELL    1000
2011-03-03    IBM  SELL    2200
2011-06-03    IBM  SELL    3300
2011-05-03    IBM   BUY    1500
2011-06-10   AAPL   BUY    1200
2011-08-01   GOOG   BUY      55
2011-08-01   GOOG  SELL      55
2011-12-20   AAPL  SELL    1200

And outputs the following for trades:

            GOOG  AAPL  XOM  IBM  Cash
2011-01-05     0     0    0    0     0
2011-01-06     0     0    0    0     0
2011-01-07     0     0    0    0     0
2011-01-08     0     0    0    0     0
2011-01-09     0     0    0    0     0
2011-01-10     0     0    0    0     0
2011-01-11     0     0    0    0     0
2011-01-12     0     0    0    0     0
2011-01-13     0     0    0    0     0
2011-01-14     0     0    0    0     0
2011-01-15     0     0    0    0     0
2011-01-16     0     0    0    0     0
2011-01-17     0     0    0    0     0
2011-01-18     0     0    0    0     0
2011-01-19     0     0    0    0     0
2011-01-20     0     0    0    0     0

I want to update my trades data frame on dates present in my idexed_orders, inserting the number of 'Shares' in the column under the correct 'Symbol' (which are the AAPL, IBM, GOOG, and XOM names in trades). I also want the value for 'Shares' to be negative when the 'Order' column in indexed_orders specifies 'SELL'. In other words, I am trying to come up with code that updates the trade data frame such that: print trade

            GOOG  AAPL  XOM  IBM  Cash
2011-01-05     0     0    0    0     0
2011-01-06     0     0    0    0     0
2011-01-07     0     0    0    0     0
2011-01-08     0     0    0    0     0
2011-01-09     0     0    0    0     0
2011-01-10     0  1500    0    0     0
2011-01-11     0     0    0    0     0
2011-01-12     0     0    0    0     0
2011-01-13     0 -1500    0 4000     0
2011-01-14     0     0    0    0     0
2011-01-15     0     0    0    0     0
2011-01-16     0     0    0    0     0
2011-01-17     0     0    0    0     0
2011-01-18     0     0    0    0     0
2011-01-19     0     0    0    0     0
2011-01-20     0     0    0    0     0

I am thinking some sort of iteration with nested boolean statements is needed, but I am definitely having a hard time figuring one out. In particular, I am having difficulty coming up with a way to interate through the rows and updating based on indexed datetime.

Any help would be GREATLY appreciated.

First, you can use Order column to sign the change in shares. Then, you can group by Date and Symbol and aggregate by summing orders. This would give you a Series of orders for all unique days and Symbols traded on those days. Finally, use unstack to convert the Series to tabular format.

import numpy as np
import pandas as pd

df = pd.io.parsers.read_csv('temp.txt', sep = '\t')

print df

'''
        Date Symbol Order  Shares
0    1/10/11   AAPL   BUY    1500
1    1/13/11   AAPL  SELL    1500
2    1/13/11    IBM   BUY    4000
3    1/26/11   GOOG   BUY    1000
4     2/2/11    XOM  SELL    4000
5    2/10/11    XOM   BUY    4000
6     3/3/11   GOOG  SELL    1000
7     3/3/11    IBM  SELL    2200
8     6/3/11    IBM  SELL    3300
9     5/3/11    IBM   BUY    1500
10   6/10/11   AAPL   BUY    1200
11    8/1/11   GOOG   BUY      55
12    8/1/11   GOOG  SELL      55
13  12/20/11   AAPL  SELL    1200
'''

df['SharesChange'] = df.Shares * df.Order.apply(lambda o: 1 if o == 'BUY' else -1)

df = df.groupby(['Date', 'Symbol']).agg({'SharesChange' : np.sum}).unstack().fillna(0)

print df
'''
         SharesChange
Symbol           AAPL  GOOG   IBM   XOM
Date
1/10/11          1500     0     0     0
1/13/11         -1500     0  4000     0
1/26/11             0  1000     0     0
12/20/11        -1200     0     0     0
2/10/11             0     0     0  4000
2/2/11              0     0     0 -4000
3/3/11              0 -1000 -2200     0
5/3/11              0     0  1500     0
6/10/11          1200     0     0     0
6/3/11              0     0 -3300     0
8/1/11              0     0     0     0
'''

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM