[英]Pandas pivoting/stacking/reshaping
I'm trying to import data to a pandas DataFrame with columns being date string, label, value. 我正在尝试将数据导入到pandas DataFrame中,其中的列为日期字符串,标签,值。 My data looks like the following (just with 4 dates and 5 labels) 我的数据如下所示(仅带有4个日期和5个标签)
from numpy import random
import numpy as np
import pandas as pd
# Creating the data
dates = ("2015-01-01", "2015-01-02", "2015-01-03", "2015-01-04")
values = [random.rand(5) for _ in range(4)]
data = dict(zip(dates,values))
So, the data is a dictionary where the keys are dates, the keys a list of values where the index is the label. 因此,数据是字典,其中键是日期,键是值列表,索引是标签。
Loading this data structure into a DataFrame 将此数据结构加载到DataFrame中
df1 = pd.DataFrame(data)
gives me the dates as columns, the label as index, and the value as the value. 给我日期作为列,标签给索引,值给值。
An alternative loading would be 另一种加载方式是
df2 = pd.DataFrame()
df2.from_dict(data, orient='index')
where the dates are index, and columns are labels. 日期是索引,列是标签。
In either of both cases do I manage to do pivoting or stacking to my preferred view. 在这两种情况下,我都设法对我的首选视图进行透视或堆叠。
How should I approach the pivoting/stacking to get the view I want? 我应该如何进行透视/堆叠以获得所需的视图? Or should I change my data structure before loading it into a DataFrame? 还是应该在将数据结构加载到DataFrame之前更改其数据结构? In particular I'd like to avoid of having to create all the rows of the table beforehand by using a bunch of calls to zip
. 特别是,我希望避免使用一堆zip
调用来预先创建表的所有行。
IIUC: IIUC:
Option 1 选项1
pd.DataFrame.stack
pd.DataFrame(data).stack() \
.rename('value').rename_axis(['label', 'date']).reset_index()
label date value
0 0 2015-01-01 0.345109
1 0 2015-01-02 0.815948
2 0 2015-01-03 0.758709
3 0 2015-01-04 0.461838
4 1 2015-01-01 0.584527
5 1 2015-01-02 0.823529
6 1 2015-01-03 0.714700
7 1 2015-01-04 0.160735
8 2 2015-01-01 0.779006
9 2 2015-01-02 0.721576
10 2 2015-01-03 0.246975
11 2 2015-01-04 0.270491
12 3 2015-01-01 0.465495
13 3 2015-01-02 0.622024
14 3 2015-01-03 0.227865
15 3 2015-01-04 0.638772
16 4 2015-01-01 0.266322
17 4 2015-01-02 0.575298
18 4 2015-01-03 0.335095
19 4 2015-01-04 0.761181
Option 2 选项2
comprehension 理解
pd.DataFrame(
[[i, d, v] for d, l in data.items() for i, v in enumerate(l)],
columns=['label', 'date', 'value']
)
label date value
0 0 2015-01-01 0.345109
1 1 2015-01-01 0.584527
2 2 2015-01-01 0.779006
3 3 2015-01-01 0.465495
4 4 2015-01-01 0.266322
5 0 2015-01-02 0.815948
6 1 2015-01-02 0.823529
7 2 2015-01-02 0.721576
8 3 2015-01-02 0.622024
9 4 2015-01-02 0.575298
10 0 2015-01-03 0.758709
11 1 2015-01-03 0.714700
12 2 2015-01-03 0.246975
13 3 2015-01-03 0.227865
14 4 2015-01-03 0.335095
15 0 2015-01-04 0.461838
16 1 2015-01-04 0.160735
17 2 2015-01-04 0.270491
18 3 2015-01-04 0.638772
19 4 2015-01-04 0.761181
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.