简体   繁体   English

取消透视熊猫数据框的最佳方法

[英]Best Way To Unpivot a Pandas Dataframe

I have some data that is missing values on weekends, public holidays etc. 我有一些周末或公共假期等缺少值的数据。

 datadate  | id | Value 
-----------------------
1999-12-31 | 01 |  1.0 
1999-12-31 | 02 |  0.5
1999-12-31 | 03 |  3.2
2000-01-04 | 01 |  1.0
2000-01-04 | 02 |  0.7
2000-01-04 | 03 |  3.2

And I want to copy the values down over the dates for which the data is missing. 我想在缺少数据的日期中向下复制这些值。 So, I've pivoted the frame, re-indexed, and copied the values down. 因此,我旋转了框架,重新索引并向下复制了值。

 datadate  | 01  | 02  | 03 
----------------------------
1999-12-31 | 1.0 | 0.5 | 3.2
2000-01-01 | 1.0 | 0.5 | 3.2
2000-01-02 | 1.0 | 0.5 | 3.2
2000-01-03 | 1.0 | 0.5 | 3.2
2000-01-04 | 1.0 | 0.7 | 3.2

Now I want to return the data to its original form. 现在,我想将数据恢复为原始形式。 I've tried using pd.melt() , and df.unstack() , but I'm ending up with more columns than I want, and constructing a new data frame from the result is taking a long time. 我尝试过使用pd.melt()df.unstack() ,但是最后得到的列比我想要的多,并且从结果构造一个新的数据帧将花费很长时间。

Is there a better way to unpivot the data ? 有没有更好的方法来取消数据显示?

There is a pandas.pivot_table function and if you define datadate and id as indices, you can do unstack the dataframe. 有一个pandas.pivot_table功能,如果你定义datadateid为指标,你可以做unstack数据帧。

That'd be: 那是:

from io import StringIO
import pandas

datatable = StringIO("""\
datadate  | id | Value 
1999-12-31 | 01 |  1.0 
1999-12-31 | 02 |  0.5
1999-12-31 | 03 |  3.2
2000-01-04 | 01 |  1.0
2000-01-04 | 02 |  0.7
2000-01-04 | 03 |  3.2""")

fullindex = pandas.DatetimeIndex(freq='1D', start='1999-12-31', end='2000-01-06')
df = (
    pandas.read_table(datatable, sep='\s+\|\s+', parse_dates=['datadate'])
          .set_index(['datadate', 'id'])
          .unstack(level='id')
          .reindex(fullindex)
          .fillna(method='ffill')
          .stack()
          .reset_index()
          .rename(columns={'level_0': 'date'}) 
)

print(df)

Which gives me: 这给了我:

         date  id  Value
0  1999-12-31   1    1.0
1  1999-12-31   2    0.5
2  1999-12-31   3    3.2
3  2000-01-01   1    1.0
4  2000-01-01   2    0.5
5  2000-01-01   3    3.2
6  2000-01-02   1    1.0
7  2000-01-02   2    0.5
8  2000-01-02   3    3.2
9  2000-01-03   1    1.0
10 2000-01-03   2    0.5
11 2000-01-03   3    3.2
12 2000-01-04   1    1.0
13 2000-01-04   2    0.7
14 2000-01-04   3    3.2
15 2000-01-05   1    1.0
16 2000-01-05   2    0.7
17 2000-01-05   3    3.2
18 2000-01-06   1    1.0
19 2000-01-06   2    0.7
20 2000-01-06   3    3.2

(I like chaining) (我喜欢链接)

You can achieve this by setting the propper attributes in the melt function like this: 您可以通过在melt函数中设置propper属性来实现此目的,如下所示:

datedate  01   02   03
0  1999-12-31   1  0.5  3.2
1  2000-01-01   1  0.5  3.2
2  2000-01-02   1  0.5  3.2
3  2000-01-03   1  0.5  3.2
4  2000-01-04   1  0.5  3.2

df_unpivoted = df.melt(id_vars=['datedate'], var_name='id', value_name='value')

datedate  id  value
0   1999-12-31  01    1.0
1   2000-01-01  01    1.0
2   2000-01-02  01    1.0
3   2000-01-03  01    1.0
4   2000-01-04  01    1.0
5   1999-12-31  02    0.5
6   2000-01-01  02    0.5
7   2000-01-02  02    0.5
8   2000-01-03  02    0.5
9   2000-01-04  02    0.5
10  1999-12-31  03    3.2
11  2000-01-01  03    3.2
12  2000-01-02  03    3.2
13  2000-01-03  03    3.2
14  2000-01-04  03    3.2

In the following link you can find a more detailed example: 在以下链接中,您可以找到更详细的示例:

https://dfrieds.com/data-analysis/melt-unpivot-python-pandas https://dfrieds.com/data-analysis/melt-unpivot-python-pandas

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM