从Pandas DataFrame存储数据的最快方法

Question

I was checking out Fastest way to iterate through a pandas dataframe? 我正在检查迭代熊猫数据帧的最快方法吗？ and I wasn't sure if it could be applied to my situation. 我不确定是否可以将其应用于我的情况。 I want to make a dictionary of the samples and features in the DataFrame 我想为DataFrame中的示例和功能制作字典

#DF_gex is a DataFrame

D_sample_Data = {}

class Sample:
    def __init__(self,D_key_value):
        self.D_key_value = D_key_value 

for i in range(DF_gex.shape[0]):
    D_key_value = {}
    sample = DF_gex.index[i]
    for j in range(DF_gex.shape[1]):
        key = DF_gex.columns[j]
        value = DF_gex.iloc[i,j]
        D_key_value[key] = value
    D_sample_Data[sample].D_key_value = D_key_value

I basically have a class called Sample in this case, in the Sample class I store a dictionary for each instance (D_key_value). 在这种情况下，我基本上有一个称为Sample的类，在Sample类中，我为每个实例（D_key_value）存储一个字典。 Right now i'm iterating through every row and every column. 现在，我正在遍历每一行和每一列。

Is there a quicker way of doing this? 有更快的方法吗？ I know that Pandas is based on Numpy arrays which have special features for indexing. 我知道Pandas基于Numpy数组，该数组具有用于索引的特殊功能。 Can one of those ways be used for this? 可以使用其中一种方法吗？

In the end, I will have a dictionary object D_sample_Data where I input a sample name and get a class instance. 最后，我将有一个字典对象D_sample_Data，在其中输入样本名称并获取类实例。 In that class instance, there will be a dictionary object unique to that sample key. 在该类实例中，将存在该样本键唯一的字典对象。

Answer 1

If you simply want a dictionary of dictionary , where the keys for the outer dictionary are the indexes and the keys for the inner dictionaries are columns and the value are the corresponding value at that index-column (or dictionary of classes containing dictionary). 如果您只想使用dictionary字典，则外部字典的键为索引，内部字典的键为列，值是该索引列（或包含字典的类的字典）上的对应值。

Then you don't need loops, you can simply use DataFrame.to_dict() method. 然后，您不需要循环，只需使用DataFrame.to_dict()方法即可。 Example - 范例-

resultdict = df.T.to_dict()

Or from Pandas version 0.17.0 you can also use the keyword argument orient='index' . 或者从Pandas版本0.17.0开始，您还可以使用关键字参数orient='index' 。 Example - 范例-

resultdict = df.to_dict(orient='index')

Demo - 演示-

In [73]: df
Out[73]:
   Col1  Col2  Col3
a     1     2     3
b     4     5     6
c     7     8     9

In [74]: df.T.to_dict()
Out[74]:
{'a': {'Col1': 1, 'Col2': 2, 'Col3': 3},
 'b': {'Col1': 4, 'Col2': 5, 'Col3': 6},
 'c': {'Col1': 7, 'Col2': 8, 'Col3': 9}}

If you want the values of the outer dictionary to be of type class Sample , though I hardly doubt that is useful at all , then you can do - 如果您希望外部词典的值是class Sample的类型，尽管我几乎不怀疑这样做很有用，那么您可以-

class Sample:
    def __init__(self,D_key_value):
        self.D_key_value = D_key_value 

resultdict = df.T.to_dict()

resultdict = {k:Sample(v) for k,v in resultdict.items()}

从Pandas DataFrame存储数据的最快方法

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-10-15 20:04:07

从Pandas DataFrame存储数据的最快方法

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-10-15 20:04:07

解决方案1
1 已采纳 2015-10-15 20:04:07