Efficiently adding rows to pandas DataFrame

Question

I'm trying to create a simple backtester on python which allows me to assess the performance of a trading strategy. As part of the backtester, I need to record the transactions which occur. Eg,

Day     Stock    Action     Quantity
1       AAPL     BUY        20
2       CSCO     SELL       30
2       AMZN     SELL       50

During the trading simulation, I'll need to add more transactions.

What's the most efficient way to do this. Should I create a transactions list at the start of the simulation and append lists such as [5, 'AAPL', 'BUY', 20] as I go. Should I instead use a dictionary, or a numpy array? Or just a Pandas DataFrame directly?

Thanks,

Jack

Answer 1

list.append operations are amortised constant time operations, because it just involves shifting pointers around.

OTOH, numpy.ndarray and pd.DataFrame objects are internally represented as arrays in C, and those are immutable. Each time you "append" to an array/dataframe, you have to reallocate new memory for an entire copy of the old data plus the appended, and ends up being linear in complexity.

So, as @ayhan said in a comment , accumulate your data in a list , and then load into a dataframe once you're done.

Efficiently adding rows to pandas DataFrame

Question

1 answers

solution1
4 ACCPTED 2018-01-17 12:29:32

Efficiently adding rows to pandas DataFrame

Question

1 answers

solution1 4 ACCPTED 2018-01-17 12:29:32

solution1
4 ACCPTED 2018-01-17 12:29:32