Fastest way to store data from Pandas DataFrame

Question

I was checking out Fastest way to iterate through a pandas dataframe? and I wasn't sure if it could be applied to my situation. I want to make a dictionary of the samples and features in the DataFrame

#DF_gex is a DataFrame

D_sample_Data = {}

class Sample:
    def __init__(self,D_key_value):
        self.D_key_value = D_key_value 

for i in range(DF_gex.shape[0]):
    D_key_value = {}
    sample = DF_gex.index[i]
    for j in range(DF_gex.shape[1]):
        key = DF_gex.columns[j]
        value = DF_gex.iloc[i,j]
        D_key_value[key] = value
    D_sample_Data[sample].D_key_value = D_key_value

I basically have a class called Sample in this case, in the Sample class I store a dictionary for each instance (D_key_value). Right now i'm iterating through every row and every column.

Is there a quicker way of doing this? I know that Pandas is based on Numpy arrays which have special features for indexing. Can one of those ways be used for this?

In the end, I will have a dictionary object D_sample_Data where I input a sample name and get a class instance. In that class instance, there will be a dictionary object unique to that sample key.

Answer 1

If you simply want a dictionary of dictionary , where the keys for the outer dictionary are the indexes and the keys for the inner dictionaries are columns and the value are the corresponding value at that index-column (or dictionary of classes containing dictionary).

Then you don't need loops, you can simply use DataFrame.to_dict() method. Example -

resultdict = df.T.to_dict()

Or from Pandas version 0.17.0 you can also use the keyword argument orient='index' . Example -

resultdict = df.to_dict(orient='index')

Demo -

In [73]: df
Out[73]:
   Col1  Col2  Col3
a     1     2     3
b     4     5     6
c     7     8     9

In [74]: df.T.to_dict()
Out[74]:
{'a': {'Col1': 1, 'Col2': 2, 'Col3': 3},
 'b': {'Col1': 4, 'Col2': 5, 'Col3': 6},
 'c': {'Col1': 7, 'Col2': 8, 'Col3': 9}}

If you want the values of the outer dictionary to be of type class Sample , though I hardly doubt that is useful at all , then you can do -

class Sample:
    def __init__(self,D_key_value):
        self.D_key_value = D_key_value 

resultdict = df.T.to_dict()

resultdict = {k:Sample(v) for k,v in resultdict.items()}

Fastest way to store data from Pandas DataFrame

Question

1 answers

solution1
1 ACCPTED 2015-10-15 20:04:07

Fastest way to store data from Pandas DataFrame

Question

1 answers

solution1 1 ACCPTED 2015-10-15 20:04:07

solution1
1 ACCPTED 2015-10-15 20:04:07