简体   繁体   English

将 DataFrame 中的列转置为二进制矩阵

[英]Transpose column in a DataFrame into a binary matrix

Context语境

Lets say I have a pandas-DataFrame like this:假设我有一个像这样的 Pandas-DataFrame:

>>> data.head()
                            values  atTime
date        
2006-07-01 00:00:00+02:00   15.10   0000
2006-07-01 00:15:00+02:00   16.10   0015
2006-07-01 00:30:00+02:00   17.75   0030
2006-07-01 00:45:00+02:00   17.35   0045
2006-07-01 01:00:00+02:00   17.25   0100

atTime represents the hour and minute of the timestamp used as index. atTime表示用作索引的时间戳的小时和分钟。 I want to transpose the atTime -column to a binary matrix (making it sparse is also an option), which will be used as nominal feature in a machine learning approach.我想将atTime列转置为二进制矩阵(使其稀疏也是一种选择),这将用作机器学习方法中的标称特征。

The desired result should look like:所需的结果应如下所示:

>>> data.head()
                            values  0000  0015  0030  0045  0000
date        
2006-07-01 00:00:00+02:00   15.10   1     0     0     0     0
2006-07-01 00:15:00+02:00   16.10   0     1     0     0     0
2006-07-01 00:30:00+02:00   17.75   0     0     1     0     0
2006-07-01 00:45:00+02:00   17.35   0     0     0     1     0
2006-07-01 01:00:00+02:00   17.25   0     0     0     0     1

As might be anticipated, this matrix will be much larger when concidering all values in atTime .正如预期的那样,在考虑 atTime 中的所有值时,该矩阵将大得多。


My question我的问题

I can achieve the desired result with workarounds using apply and using the timestamps in order to create the new columns beforehand.我可以通过使用apply和使用时间戳的变通方法来实现所需的结果,以便预先创建新列。

However, is there a build-in option in pandas (or via numpy, concidering atTime as numpy-array) to achieve the same without a workaround?但是,在没有解决方法的情况下,pandas 中是否有内置选项(或通过 numpy,将 atTime 视为 numpy-array)来实现相同的功能?

This is a use case for get_dummies :这是get_dummies一个用例:

pd.get_dummies(df, columns=["atTime"]) 
                           values  atTime_0  atTime_15  atTime_30  atTime_45  atTime_100
date                                                                                    
2006-07-01 00:00:00+02:00   15.10         1          0          0          0           0
2006-07-01 00:15:00+02:00   16.10         0          1          0          0           0
2006-07-01 00:30:00+02:00   17.75         0          0          1          0           0
2006-07-01 00:45:00+02:00   17.35         0          0          0          1           0
2006-07-01 01:00:00+02:00   17.25         0          0          0          0           1

Solution updated with OP's recommendation.根据 OP 的建议更新了解决方案。 Thanks!谢谢!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM