How to split string column into categorical columns in pandas

Question

I have a string column containing something like this

Col1
ind1,ind2,ind3
ind1,ind5,ind3
ind2,ind3,ind5,ind4

I want to split it to the following columns:

ind_1	ind_2	ind_3	ind_4	ind_5
ind1	ind2	ind3
ind1		ind3		ind5
	ind2	ind3	ind4	ind5

using .str.rsplit(',', expand=True) does not order the same strings in the same column.

Answer 1

Explode your column then pivot your dataframe:

out = df['Col1'].str.split(',').explode().reset_index()
out = (out.pivot('index', 'Col1', 'Col1').fillna('')
          .rename_axis(index=None, columns=None))
print(out)

# Output
   ind1  ind2  ind3  ind4  ind5
0  ind1  ind2  ind3            
1  ind1        ind3        ind5
2        ind2  ind3  ind4  ind5

Answer 2

use df.column.str.get_dummies with seperator as ","

import pandas as pd

df = pd.DataFrame({
    "col1" : ["ind1,ind2,ind3", "ind1,ind5,ind3", "ind2,ind3,ind5,ind4"]
})
df.head()

# output
    col1
0   ind1,ind2,ind3
1   ind1,ind5,ind3
2   ind2,ind3,ind5,ind4

df  = pd.concat([df,df.col1.str.get_dummies(sep = ",")], axis =1)
df

# output
    col1                ind1    ind2    ind3    ind4    ind5
0   ind1,ind2,ind3      1       1       1       0       0
1   ind1,ind5,ind3      1       0       1       0       1
2   ind2,ind3,ind5,ind4 0       1       1       1       1

Answer 3

我想我找到了解决方案，甚至返回了二进制结果：

df.join(df.Col1.str.get_dummies(',').apply(lambda x: np.where(x == 1, 1, 0))).drop(columns=['Col1'])

How to split string column into categorical columns in pandas

Question

3 answers

solution1
2 2022-06-13 13:51:53

solution2
2 ACCPTED 2022-06-13 13:58:50

solution3
0 2022-06-13 14:01:19

How to split string column into categorical columns in pandas

Question

3 answers

solution1 2 2022-06-13 13:51:53

solution2 2 ACCPTED 2022-06-13 13:58:50

solution3 0 2022-06-13 14:01:19

solution1
2 2022-06-13 13:51:53

solution2
2 ACCPTED 2022-06-13 13:58:50

solution3
0 2022-06-13 14:01:19