[英]Create MultiIndex DataFrame from a Dict of Series of Numpy Array
Given a dictionary of pandas.Series
with numpy.array
in each cell,给定一个字典
pandas.Series
和numpy.array
在每个单元格中,
import pandas as pd
import numpy as np
N = 5
foo = [x for x in np.random.randint(10, size=(N,8))] # list of ndarray
bar = [x for x in np.random.randint(10, size=(N,8))] # list of ndarray
baz = [x for x in np.random.randint(10, size=(N,8))] # list of ndarray
input = {
'foo': pd.Series(foo, index=pd.date_range('2020-01-01', periods=N, freq='D')),
'bar': pd.Series(bar, index=pd.date_range('2020-01-01', periods=N, freq='D')),
'baz': pd.Series(baz, index=pd.date_range('2020-01-01', periods=N, freq='D')),
}
print(input)
# {'foo':
# 2020-01-01 [4, 1, 3, 3, 4, 6, 0, 2]
# 2020-01-02 [7, 7, 1, 2, 1, 2, 8, 6]
# 2020-01-03 [1, 0, 6, 8, 1, 8, 2, 3]
# 2020-01-04 [1, 5, 6, 0, 1, 8, 8, 4]
# 2020-01-05 [4, 7, 9, 3, 5, 3, 0, 1]
# Freq: D, dtype: object,
# 'bar':
# 2020-01-01 [0, 2, 2, 5, 4, 9, 7, 9]
# 2020-01-02 [7, 0, 8, 0, 7, 8, 8, 9]
# 2020-01-03 [6, 7, 2, 7, 2, 9, 8, 7]
# 2020-01-04 [1, 8, 8, 9, 6, 1, 4, 6]
# 2020-01-05 [9, 4, 4, 2, 6, 2, 7, 7]
# Freq: D, dtype: object,
# 'baz':
# 2020-01-01 [9, 2, 9, 2, 5, 3, 5, 3]
# 2020-01-02 [6, 5, 3, 3, 9, 7, 7, 9]
# 2020-01-03 [5, 7, 0, 6, 1, 5, 6, 7]
# 2020-01-04 [3, 9, 2, 6, 1, 5, 9, 9]
# 2020-01-05 [2, 7, 6, 4, 1, 2, 9, 2]
# Freq: D, dtype: object}
What is the most efficient method to convert this into a MultiIndex pandas DataFrame with the dictionary key in the first multi-index level and the series' DateTimeIndex in the second multi-index level?将其转换为 MultiIndex pandas DataFrame 的最有效方法是什么,其中字典键位于第一个多索引级别,而系列的 DateTimeIndex 在第二个多索引级别?
Using the example given above, the desired pandas DataFrame will have 15 rows and 8 columns使用上面给出的示例,所需的 pandas DataFrame 将有 15 行和 8 列
When using random, kindly use seed , so your data is reproducible.使用随机时,请使用种子,这样您的数据是可重现的。
You can use pandas concat , combined with numpy's vstack to get your desired output:您可以使用 pandas concat ,结合 numpy 的vstack来获得您想要的 output:
np.random.seed(5)
N = 5
foo = [x for x in np.random.randint(10, size=(N, 8))] # list of ndarray
bar = [x for x in np.random.randint(10, size=(N, 8))] # list of ndarray
baz = [x for x in np.random.randint(10, size=(N, 8))] # list of ndarray
data = {
"foo": pd.Series(foo, index=pd.date_range("2020-01-01", periods=N, freq="D")),
"bar": pd.Series(bar, index=pd.date_range("2020-01-01", periods=N, freq="D")),
"baz": pd.Series(baz, index=pd.date_range("2020-01-01", periods=N, freq="D")),
}
box = pd.concat(data)
pd.DataFrame(np.vstack(box), index=box.index)
0 1 2 3 4 5 6 7
foo 2020-01-01 3 6 6 0 9 8 4 7
2020-01-02 0 0 7 1 5 7 0 1
2020-01-03 4 6 2 9 9 9 9 1
2020-01-04 2 7 0 5 0 0 4 4
2020-01-05 9 3 2 4 6 9 3 3
bar 2020-01-01 2 1 5 7 4 3 1 7
2020-01-02 3 1 9 5 7 0 9 6
2020-01-03 0 5 2 8 6 8 0 5
2020-01-04 2 0 7 7 6 0 0 8
2020-01-05 5 5 9 6 4 5 2 8
baz 2020-01-01 8 1 6 3 4 1 8 0
2020-01-02 2 2 4 1 6 3 4 3
2020-01-03 1 4 2 3 4 9 4 0
2020-01-04 6 6 9 2 9 3 0 8
2020-01-05 8 9 7 4 8 6 8 0
A simple way would be to utilize pandas to its fullest: magic of stack() , to_frame() & swaplevel()一个简单的方法是充分利用 pandas: stack() 、 to_frame()和swaplevel()的魔力
df = pd.DataFrame(inputs).stack().to_frame().swaplevel()
df.iloc[:,0].apply(lambda x: pd.Series({idx: value for idx, value in enumerate(x)}))
produces:产生:
0 1 2 3 4 5 6 7
foo 2020-01-01 2 3 5 1 7 0 8 2
bar 2020-01-01 8 1 4 6 1 7 3 1
baz 2020-01-01 7 3 4 3 9 0 5 0
foo 2020-01-02 8 3 8 1 6 5 5 4
bar 2020-01-02 2 1 9 5 6 6 1 4
baz 2020-01-02 4 3 3 8 7 4 2 4
foo 2020-01-03 8 8 5 2 9 4 1 1
bar 2020-01-03 0 0 0 8 8 5 8 5
baz 2020-01-03 1 5 5 9 5 2 2 7
foo 2020-01-04 2 7 6 3 0 8 2 5
bar 2020-01-04 1 8 0 3 1 5 1 3
baz 2020-01-04 5 0 7 6 1 7 7 9
foo 2020-01-05 9 0 8 5 9 9 6 8
bar 2020-01-05 0 3 1 6 4 1 9 6
baz 2020-01-05 4 6 6 7 9 3 0 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.