简体   繁体   English

熊猫旋转/堆叠/整形

[英]Pandas pivoting/stacking/reshaping

I'm trying to import data to a pandas DataFrame with columns being date string, label, value. 我正在尝试将数据导入到pandas DataFrame中,其中的列为日期字符串,标签,值。 My data looks like the following (just with 4 dates and 5 labels) 我的数据如下所示(仅带有4个日期和5个标签)

from numpy import random
import numpy as np
import pandas as pd

# Creating the data
dates = ("2015-01-01", "2015-01-02", "2015-01-03", "2015-01-04")
values = [random.rand(5) for _ in range(4)]

data = dict(zip(dates,values))

So, the data is a dictionary where the keys are dates, the keys a list of values where the index is the label. 因此,数据是字典,其中键是日期,键是值列表,索引是标签。

Loading this data structure into a DataFrame 将此数据结构加载到DataFrame中

df1 = pd.DataFrame(data)

gives me the dates as columns, the label as index, and the value as the value. 给我日期作为列,标签给索引,值给值。

An alternative loading would be 另一种加载方式是

df2 = pd.DataFrame()
df2.from_dict(data, orient='index')

where the dates are index, and columns are labels. 日期是索引,列是标签。

In either of both cases do I manage to do pivoting or stacking to my preferred view. 在这两种情况下,我都设法对我的首选视图进行透视或堆叠。

How should I approach the pivoting/stacking to get the view I want? 我应该如何进行透视/堆叠以获得所需的视图? Or should I change my data structure before loading it into a DataFrame? 还是应该在将数据结构加载到DataFrame之前更改其数据结构? In particular I'd like to avoid of having to create all the rows of the table beforehand by using a bunch of calls to zip . 特别是,我希望避免使用一堆zip调用来预先创建表的所有行。

IIUC: IIUC:

Option 1 选项1
pd.DataFrame.stack

pd.DataFrame(data).stack() \
    .rename('value').rename_axis(['label', 'date']).reset_index()

    label        date     value
0       0  2015-01-01  0.345109
1       0  2015-01-02  0.815948
2       0  2015-01-03  0.758709
3       0  2015-01-04  0.461838
4       1  2015-01-01  0.584527
5       1  2015-01-02  0.823529
6       1  2015-01-03  0.714700
7       1  2015-01-04  0.160735
8       2  2015-01-01  0.779006
9       2  2015-01-02  0.721576
10      2  2015-01-03  0.246975
11      2  2015-01-04  0.270491
12      3  2015-01-01  0.465495
13      3  2015-01-02  0.622024
14      3  2015-01-03  0.227865
15      3  2015-01-04  0.638772
16      4  2015-01-01  0.266322
17      4  2015-01-02  0.575298
18      4  2015-01-03  0.335095
19      4  2015-01-04  0.761181

Option 2 选项2
comprehension 理解

pd.DataFrame(
    [[i, d, v] for d, l in data.items() for i, v in enumerate(l)],
    columns=['label', 'date', 'value']
)

    label        date     value
0       0  2015-01-01  0.345109
1       1  2015-01-01  0.584527
2       2  2015-01-01  0.779006
3       3  2015-01-01  0.465495
4       4  2015-01-01  0.266322
5       0  2015-01-02  0.815948
6       1  2015-01-02  0.823529
7       2  2015-01-02  0.721576
8       3  2015-01-02  0.622024
9       4  2015-01-02  0.575298
10      0  2015-01-03  0.758709
11      1  2015-01-03  0.714700
12      2  2015-01-03  0.246975
13      3  2015-01-03  0.227865
14      4  2015-01-03  0.335095
15      0  2015-01-04  0.461838
16      1  2015-01-04  0.160735
17      2  2015-01-04  0.270491
18      3  2015-01-04  0.638772
19      4  2015-01-04  0.761181

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM