[英]Efficiently creating multiple masks from pandas series
Given a series that looks like: 鉴于一系列看起来像:
0 foo
1 bar
2 foo
3 foo
4 bar
5 baz
How can I create a dataframe where each column is a mask for a unique value in the series? 如何创建一个数据框,其中每列是系列中唯一值的掩码? In this example, it would look like: 在这个例子中,它看起来像:
foo bar baz
0 True False False
1 False True False
2 True False False
3 True False False
4 False True False
5 False False True
Using get_dummies
使用get_dummies
s.str.get_dummies().astype(bool)
Out[392]:
bar baz foo
0 False False True
1 True False False
2 False False True
3 False False True
4 True False False
5 False True False
Or we try something new crosstab
或者我们尝试一些新的crosstab
pd.crosstab(s.index,s).astype(bool)
Out[395]:
a bar baz foo
row_0
0 False False True
1 True False False
2 False False True
3 False False True
4 True False False
5 False True False
Here's one with array-initialization
- 这是一个有array-initialization
-
def series_hotencode(s):
a,b = s.factorize()
ar = np.zeros((len(a),len(b)), dtype=bool)
ar[np.arange(len(a)),a] = 1
return pd.DataFrame(ar,columns=b)
Sample run - 样品运行 -
In [40]: s
Out[40]:
0 foo
1 bar
2 foo
3 foo
4 bar
5 baz
Name: 1, dtype: object
In [41]: series_hotencode(s)
Out[41]:
foo bar baz
0 True False False
1 False True False
2 True False False
3 True False False
4 False True False
5 False False True
Let's try pd.factorize
+ np.eye
for a fast, concise solution. 让我们试试pd.factorize
+ np.eye
来获得快速,简洁的解决方案。
x,y = pd.factorize(s)
pd.DataFrame(np.eye(len(y), dtype=bool)[x], columns=y)
foo bar baz
0 True False False
1 False True False
2 True False False
3 True False False
4 False True False
5 False False True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.