Efficiently add single row to Pandas Series or DataFrame

Question

I want to use Pandas to work with series in real-time. Every second, I need to add the latest observation to an existing series. My series are grouped into a DataFrame and stored in an HDF5 file.

Here's how I do it at the moment:

>> existing_series = Series([7,13,97], [0,1,2]) 
>> updated_series = existing_series.append( Series([111], [3]) )

Is this the most efficient way? I've read countless posts but cannot find any that focuses on efficiency with high-frequency data.

Edit : I just read about modules shelve and pickle. It seems like they would achieve what I'm trying to do, basically save lists on disks. Because my lists are large, is there any way not to load the full list into memory but, rather, efficiently append values one at a time?

Answer 1

Take a look at the new PyTables docs in 0.10 (coming soon) or you can get from master. http://pandas.pydata.org/pandas-docs/dev/whatsnew.html

PyTables is actually pretty good at appending, and writing to a HDFStore every second will work. You want to store a DataFrame table. You can then select data in a query like fashion, eg

store.append('df', the_latest_df)
store.append('df', the_latest_df)
....
store.select('df', [ 'index>12:00:01' ])

If this is all from the same process, then this will work great. If you have a writer process and then another process is reading, this is a little tricky (but will work correctly depending on what you are doing).

Another option is to use messaging to transmit from one process to another (and then append in memory), this avoids the serialization issue.

Efficiently add single row to Pandas Series or DataFrame

Question

1 answers

solution1
2 2012-12-10 00:21:51

Efficiently add single row to Pandas Series or DataFrame

Question

1 answers

solution1 2 2012-12-10 00:21:51

solution1
2 2012-12-10 00:21:51