I have a string column containing something like this
Col1 |
---|
ind1,ind2,ind3 |
ind1,ind5,ind3 |
ind2,ind3,ind5,ind4 |
I want to split it to the following columns:
ind_1 | ind_2 | ind_3 | ind_4 | ind_5 |
---|---|---|---|---|
ind1 | ind2 | ind3 | ||
ind1 | ind3 | ind5 | ||
ind2 | ind3 | ind4 | ind5 |
using .str.rsplit(',', expand=True)
does not order the same strings in the same column.
Explode your column then pivot your dataframe:
out = df['Col1'].str.split(',').explode().reset_index()
out = (out.pivot('index', 'Col1', 'Col1').fillna('')
.rename_axis(index=None, columns=None))
print(out)
# Output
ind1 ind2 ind3 ind4 ind5
0 ind1 ind2 ind3
1 ind1 ind3 ind5
2 ind2 ind3 ind4 ind5
use df.column.str.get_dummies with seperator as ","
import pandas as pd
df = pd.DataFrame({
"col1" : ["ind1,ind2,ind3", "ind1,ind5,ind3", "ind2,ind3,ind5,ind4"]
})
df.head()
# output
col1
0 ind1,ind2,ind3
1 ind1,ind5,ind3
2 ind2,ind3,ind5,ind4
df = pd.concat([df,df.col1.str.get_dummies(sep = ",")], axis =1)
df
# output
col1 ind1 ind2 ind3 ind4 ind5
0 ind1,ind2,ind3 1 1 1 0 0
1 ind1,ind5,ind3 1 0 1 0 1
2 ind2,ind3,ind5,ind4 0 1 1 1 1
我想我找到了解决方案,甚至返回了二进制结果:
df.join(df.Col1.str.get_dummies(',').apply(lambda x: np.where(x == 1, 1, 0))).drop(columns=['Col1'])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.