[英]Transpose column in a DataFrame into a binary matrix
Context语境
Lets say I have a pandas-DataFrame like this:假设我有一个像这样的 Pandas-DataFrame:
>>> data.head()
values atTime
date
2006-07-01 00:00:00+02:00 15.10 0000
2006-07-01 00:15:00+02:00 16.10 0015
2006-07-01 00:30:00+02:00 17.75 0030
2006-07-01 00:45:00+02:00 17.35 0045
2006-07-01 01:00:00+02:00 17.25 0100
atTime represents the hour and minute of the timestamp used as index. atTime表示用作索引的时间戳的小时和分钟。 I want to transpose the atTime -column to a binary matrix (making it sparse is also an option), which will be used as nominal feature in a machine learning approach.
我想将atTime列转置为二进制矩阵(使其稀疏也是一种选择),这将用作机器学习方法中的标称特征。
The desired result should look like:所需的结果应如下所示:
>>> data.head()
values 0000 0015 0030 0045 0000
date
2006-07-01 00:00:00+02:00 15.10 1 0 0 0 0
2006-07-01 00:15:00+02:00 16.10 0 1 0 0 0
2006-07-01 00:30:00+02:00 17.75 0 0 1 0 0
2006-07-01 00:45:00+02:00 17.35 0 0 0 1 0
2006-07-01 01:00:00+02:00 17.25 0 0 0 0 1
As might be anticipated, this matrix will be much larger when concidering all values in atTime .正如预期的那样,在考虑 atTime 中的所有值时,该矩阵将大得多。
My question我的问题
I can achieve the desired result with workarounds using apply
and using the timestamps in order to create the new columns beforehand.我可以通过使用
apply
和使用时间戳的变通方法来实现所需的结果,以便预先创建新列。
However, is there a build-in option in pandas (or via numpy, concidering atTime as numpy-array) to achieve the same without a workaround?但是,在没有解决方法的情况下,pandas 中是否有内置选项(或通过 numpy,将 atTime 视为 numpy-array)来实现相同的功能?
This is a use case for get_dummies
:这是
get_dummies
一个用例:
pd.get_dummies(df, columns=["atTime"])
values atTime_0 atTime_15 atTime_30 atTime_45 atTime_100
date
2006-07-01 00:00:00+02:00 15.10 1 0 0 0 0
2006-07-01 00:15:00+02:00 16.10 0 1 0 0 0
2006-07-01 00:30:00+02:00 17.75 0 0 1 0 0
2006-07-01 00:45:00+02:00 17.35 0 0 0 1 0
2006-07-01 01:00:00+02:00 17.25 0 0 0 0 1
Solution updated with OP's recommendation.根据 OP 的建议更新了解决方案。 Thanks!
谢谢!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.