将 DataFrame 中的列转置为二进制矩阵

Question

Context语境

Lets say I have a pandas-DataFrame like this:假设我有一个像这样的 Pandas-DataFrame：

>>> data.head()
                            values  atTime
date        
2006-07-01 00:00:00+02:00   15.10   0000
2006-07-01 00:15:00+02:00   16.10   0015
2006-07-01 00:30:00+02:00   17.75   0030
2006-07-01 00:45:00+02:00   17.35   0045
2006-07-01 01:00:00+02:00   17.25   0100

atTime represents the hour and minute of the timestamp used as index. atTime表示用作索引的时间戳的小时和分钟。 I want to transpose the atTime -column to a binary matrix (making it sparse is also an option), which will be used as nominal feature in a machine learning approach.我想将atTime列转置为二进制矩阵（使其稀疏也是一种选择），这将用作机器学习方法中的标称特征。

The desired result should look like:所需的结果应如下所示：

>>> data.head()
                            values  0000  0015  0030  0045  0000
date        
2006-07-01 00:00:00+02:00   15.10   1     0     0     0     0
2006-07-01 00:15:00+02:00   16.10   0     1     0     0     0
2006-07-01 00:30:00+02:00   17.75   0     0     1     0     0
2006-07-01 00:45:00+02:00   17.35   0     0     0     1     0
2006-07-01 01:00:00+02:00   17.25   0     0     0     0     1

As might be anticipated, this matrix will be much larger when concidering all values in atTime .正如预期的那样，在考虑 atTime 中的所有值时，该矩阵将大得多。

My question我的问题

I can achieve the desired result with workarounds using apply and using the timestamps in order to create the new columns beforehand.我可以通过使用apply和使用时间戳的变通方法来实现所需的结果，以便预先创建新列。

However, is there a build-in option in pandas (or via numpy, concidering atTime as numpy-array) to achieve the same without a workaround?但是，在没有解决方法的情况下，pandas 中是否有内置选项（或通过 numpy，将 atTime 视为 numpy-array）来实现相同的功能？

Answer 1

This is a use case for get_dummies :这是get_dummies一个用例：

pd.get_dummies(df, columns=["atTime"])

                           values  atTime_0  atTime_15  atTime_30  atTime_45  atTime_100
date                                                                                    
2006-07-01 00:00:00+02:00   15.10         1          0          0          0           0
2006-07-01 00:15:00+02:00   16.10         0          1          0          0           0
2006-07-01 00:30:00+02:00   17.75         0          0          1          0           0
2006-07-01 00:45:00+02:00   17.35         0          0          0          1           0
2006-07-01 01:00:00+02:00   17.25         0          0          0          0           1

Solution updated with OP's recommendation.根据 OP 的建议更新了解决方案。 Thanks!谢谢！

将 DataFrame 中的列转置为二进制矩阵

问题描述

1 个解决方案

解决方案1
8 已采纳 2019-06-19 16:25:31

将 DataFrame 中的列转置为二进制矩阵

问题描述

1 个解决方案

解决方案1 8 已采纳 2019-06-19 16:25:31

解决方案1
8 已采纳 2019-06-19 16:25:31