[英]Pandas DataFrame: Spread CSV columns to multiple columns
I have a pandas DataFrame 我有一个熊猫DataFrame
>>> import pandas as pd
>>> df = pd.DataFrame([['a', 2, 3], ['a,b', 5, 6], ['c', 8, 9]])
0 1 2
0 a 2 3
1 a,b 5 6
2 c 8 9
I want to spread the first column to n
columns (where n
is the number of unique, comma-separated values, in this case 3). 我想将第一列扩展为n
列(其中n
是唯一的,用逗号分隔的值的数量,在这种情况下为3)。 Each of the resulting columns shall be 1 if the value is present, and 0 else. 如果存在该值,则每个结果列应为1,否则为0。 Expected result is: 预期结果是:
1 2 a c b
0 2 3 1 0 0
1 5 6 1 0 1
2 8 9 0 1 0
I came up with the following code, but it seems a bit circuitous to me. 我想出了以下代码,但对我来说似乎有点circuit回。
>>> import re
>>> dfSpread = pd.get_dummies(df[0].str.split(',', expand=True)).\
rename(columns=lambda x: re.sub('.*_','',x))
>>> pd.concat([df.iloc[:,1:], dfSpread], axis = 1)
Is there a built-in function that does just that that I wasn't able to find? 是否有内置函数可以执行我找不到的功能?
Using get_dummies
使用get_dummies
df.set_index([1,2])[0].str.get_dummies(',').reset_index()
Out[229]:
1 2 a b c
0 2 3 1 0 0
1 5 6 1 1 0
2 8 9 0 0 1
You can use pop
+ concat
here for an alternative version of Wen's answer. 您可以在此处使用pop
+ concat
作为Wen答案的替代版本。
pd.concat([df, df.pop(df.columns[0]).str.get_dummies(sep=',')], axis=1)
1 2 a b c
0 2 3 1 0 0
1 5 6 1 1 0
2 8 9 0 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.