简体   繁体   English

从Pandas DataFrame存储数据的最快方法

[英]Fastest way to store data from Pandas DataFrame

I was checking out Fastest way to iterate through a pandas dataframe? 我正在检查迭代熊猫数据帧的最快方法吗? and I wasn't sure if it could be applied to my situation. 我不确定是否可以将其应用于我的情况。 I want to make a dictionary of the samples and features in the DataFrame 我想为DataFrame中的示例和功能制作字典

#DF_gex is a DataFrame

D_sample_Data = {}

class Sample:
    def __init__(self,D_key_value):
        self.D_key_value = D_key_value 

for i in range(DF_gex.shape[0]):
    D_key_value = {}
    sample = DF_gex.index[i]
    for j in range(DF_gex.shape[1]):
        key = DF_gex.columns[j]
        value = DF_gex.iloc[i,j]
        D_key_value[key] = value
    D_sample_Data[sample].D_key_value = D_key_value

I basically have a class called Sample in this case, in the Sample class I store a dictionary for each instance (D_key_value). 在这种情况下,我基本上有一个称为Sample的类,在Sample类中,我为每个实例(D_key_value)存储一个字典。 Right now i'm iterating through every row and every column. 现在,我正在遍历每一行和每一列。

Is there a quicker way of doing this? 有更快的方法吗? I know that Pandas is based on Numpy arrays which have special features for indexing. 我知道Pandas基于Numpy数组,该数组具有用于索引的特殊功能。 Can one of those ways be used for this? 可以使用其中一种方法吗?

In the end, I will have a dictionary object D_sample_Data where I input a sample name and get a class instance. 最后,我将有一个字典对象D_sample_Data,在其中输入样本名称并获取类实例。 In that class instance, there will be a dictionary object unique to that sample key. 在该类实例中,将存在该样本键唯一的字典对象。

If you simply want a dictionary of dictionary , where the keys for the outer dictionary are the indexes and the keys for the inner dictionaries are columns and the value are the corresponding value at that index-column (or dictionary of classes containing dictionary). 如果您只想使用dictionary字典,则外部字典的键为索引,内部字典的键为列,值是该索引列(或包含字典的类的字典)上的对应值。

Then you don't need loops, you can simply use DataFrame.to_dict() method. 然后,您不需要循环,只需使用DataFrame.to_dict()方法即可。 Example - 范例-

resultdict = df.T.to_dict()

Or from Pandas version 0.17.0 you can also use the keyword argument orient='index' . 或者从Pandas版本0.17.0开始,您还可以使用关键字参数orient='index' Example - 范例-

resultdict = df.to_dict(orient='index')

Demo - 演示-

In [73]: df
Out[73]:
   Col1  Col2  Col3
a     1     2     3
b     4     5     6
c     7     8     9

In [74]: df.T.to_dict()
Out[74]:
{'a': {'Col1': 1, 'Col2': 2, 'Col3': 3},
 'b': {'Col1': 4, 'Col2': 5, 'Col3': 6},
 'c': {'Col1': 7, 'Col2': 8, 'Col3': 9}}

If you want the values of the outer dictionary to be of type class Sample , though I hardly doubt that is useful at all , then you can do - 如果您希望外部词典的值是class Sample的类型,尽管我几乎不怀疑这样做很有用,那么您可以-

class Sample:
    def __init__(self,D_key_value):
        self.D_key_value = D_key_value 

resultdict = df.T.to_dict()

resultdict = {k:Sample(v) for k,v in resultdict.items()}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM