简体   繁体   English

带有pandas的DataFrame的DataFrame

[英]DataFrame of DataFrames with pandas

I have the following DataFrame gathering daily stats on 2 measures A and B : 我有以下DataFrame收集2个度量A和B的每日统计数据:

                  A             B
count  17266.000000  17266.000000
std        0.179003      0.178781
75%      101.102251    101.053214
min      100.700993    100.651956
mean     101.016747    100.964003
max      101.540214    101.491178
50%      100.988465    100.938694
25%      100.885251    100.830048

Below is a piece of code that creates it: 下面是一段创建它的代码:

day1 = {
    'A': {
    'count': 17266.0,
    'std': 0.17900265293286116,
    'min': 100.70099294189714,
    'max': 101.54021448871775,
    '50%': 100.98846526697825,
    '25%': 100.88525124427971,
    '75%': 101.10225131847992, 
    'mean': 101.01674677794136
    }, 
    'B': {
    'count': 17266.0, 
    'std': 0.17878125983374854, 
    'min': 100.65195609992342, 
    'max': 101.49117764674403, 
    '50%': 100.93869409089723, 
    '25%': 100.83004837814667, 
    '75%': 101.05321447650618, 
    'mean': 100.96400305527138
    }
}
df = pandas.DataFrame.from_dict(day1, orient='index').T

The data come right out from a describe(). 数据来自describe()。 I have several such describes (one for each day) and I would like to gather them all into a single dataframe that has the date as an index. 我有几个这样的描述(每天一个),我想将它们全部收集到一个以日期作为索引的数据帧中。

The most obvious way to obtain that would be to stack all the daily results into one dataframe, then group it by day and run the stats on the result. 获得这种方法最明显的方法是将所有每日结果堆叠到一个数据框中,然后按天分组并在结果上运行统计数据。 However I would like an alternate method because I run into a MemoryError with the amount of data I process. 但是我想要一个替代方法,因为我遇到了一个带有我处理的数据量的MemoryError。

The final outcome should look like this: 最终结果应如下所示:

                        A           B    
2014-12-24 count  15895.000000  15895.000000
        mean      99.943618     99.968860
        std        0.012468      0.011932
        min       99.877695     99.928778
        25%       99.934890     99.960445
        50%       99.943453     99.968847
        75%       99.952340     99.977571
        max       99.982930    100.002507
2014-12-25 count  16278.000000  16278.000000
        mean      99.937056     99.962203
        std        0.012395      0.012661
        min       99.884501     99.910567
        25%       99.928078     99.953758
        50%       99.936754     99.962411
        75%       99.945914     99.971473
        max       99.981512    100.003770

If you are able to make a dict of {date: describe_df_for_that_day}, then you can use pd.concat(dict) . 如果你能够制作{date:describe_df_for_that_day}的词典,那么你可以使用pd.concat(dict)

Starting with your df : 从你的df开始:

In [14]: d = {'2014-12-24': df, '2014-12-25': df}

In [15]: pd.concat(d)
Out[15]:
                             A             B
2014-12-24 count  17266.000000  17266.000000
           std        0.179003      0.178781
           75%      101.102251    101.053214
           min      100.700993    100.651956
           mean     101.016747    100.964003
           max      101.540214    101.491178
           50%      100.988465    100.938694
           25%      100.885251    100.830048
2014-12-25 count  17266.000000  17266.000000
           std        0.179003      0.178781
           75%      101.102251    101.053214
           min      100.700993    100.651956
           mean     101.016747    100.964003
           max      101.540214    101.491178
           50%      100.988465    100.938694
           25%      100.885251    100.830048

You can of course make the keys real dates instead of strings. 你当然可以使键实际日期而不是字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM