[英]DataFrame of DataFrames with pandas
I have the following DataFrame gathering daily stats on 2 measures A and B : 我有以下DataFrame收集2个度量A和B的每日统计数据:
A B
count 17266.000000 17266.000000
std 0.179003 0.178781
75% 101.102251 101.053214
min 100.700993 100.651956
mean 101.016747 100.964003
max 101.540214 101.491178
50% 100.988465 100.938694
25% 100.885251 100.830048
Below is a piece of code that creates it: 下面是一段创建它的代码:
day1 = {
'A': {
'count': 17266.0,
'std': 0.17900265293286116,
'min': 100.70099294189714,
'max': 101.54021448871775,
'50%': 100.98846526697825,
'25%': 100.88525124427971,
'75%': 101.10225131847992,
'mean': 101.01674677794136
},
'B': {
'count': 17266.0,
'std': 0.17878125983374854,
'min': 100.65195609992342,
'max': 101.49117764674403,
'50%': 100.93869409089723,
'25%': 100.83004837814667,
'75%': 101.05321447650618,
'mean': 100.96400305527138
}
}
df = pandas.DataFrame.from_dict(day1, orient='index').T
The data come right out from a describe(). 数据来自describe()。 I have several such describes (one for each day) and I would like to gather them all into a single dataframe that has the date as an index.
我有几个这样的描述(每天一个),我想将它们全部收集到一个以日期作为索引的数据帧中。
The most obvious way to obtain that would be to stack all the daily results into one dataframe, then group it by day and run the stats on the result. 获得这种方法最明显的方法是将所有每日结果堆叠到一个数据框中,然后按天分组并在结果上运行统计数据。 However I would like an alternate method because I run into a MemoryError with the amount of data I process.
但是我想要一个替代方法,因为我遇到了一个带有我处理的数据量的MemoryError。
The final outcome should look like this: 最终结果应如下所示:
A B
2014-12-24 count 15895.000000 15895.000000
mean 99.943618 99.968860
std 0.012468 0.011932
min 99.877695 99.928778
25% 99.934890 99.960445
50% 99.943453 99.968847
75% 99.952340 99.977571
max 99.982930 100.002507
2014-12-25 count 16278.000000 16278.000000
mean 99.937056 99.962203
std 0.012395 0.012661
min 99.884501 99.910567
25% 99.928078 99.953758
50% 99.936754 99.962411
75% 99.945914 99.971473
max 99.981512 100.003770
If you are able to make a dict of {date: describe_df_for_that_day}, then you can use pd.concat(dict)
. 如果你能够制作{date:describe_df_for_that_day}的词典,那么你可以使用
pd.concat(dict)
。
Starting with your df
: 从你的
df
开始:
In [14]: d = {'2014-12-24': df, '2014-12-25': df}
In [15]: pd.concat(d)
Out[15]:
A B
2014-12-24 count 17266.000000 17266.000000
std 0.179003 0.178781
75% 101.102251 101.053214
min 100.700993 100.651956
mean 101.016747 100.964003
max 101.540214 101.491178
50% 100.988465 100.938694
25% 100.885251 100.830048
2014-12-25 count 17266.000000 17266.000000
std 0.179003 0.178781
75% 101.102251 101.053214
min 100.700993 100.651956
mean 101.016747 100.964003
max 101.540214 101.491178
50% 100.988465 100.938694
25% 100.885251 100.830048
You can of course make the keys real dates instead of strings. 你当然可以使键实际日期而不是字符串。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.