I would like to have a solution for my problem with minimum effort.
Question:
I have a list of values with delimited values. I would like to split and arrange each values at the appropriate cell. Column Heading should be also populated.
A,B,C
C,D,A,E
D,E
+-------+-------+-------+-------+-------+
| VLUE1 | VLUE2 | VLUE3 | VLUE4 | VLUE5 |
+-------+-------+-------+-------+-------+
| A | B | C | | |
| A | | C | D | E |
| | | | D | E |
+-------+-------+-------+-------+-------+
I have a solution using sorting, key value pair in python and iterating but i would like to know is there any shortcut using Python packages or panda?
-Sam
Starting with a series -
s
0 A,B,C
1 C,D,A,E
2 D,E
dtype: object
Convert s
to a OHE matrix using get_dummies
-
x = s.str.get_dummies(sep=',')
x
A B C D E
0 1 1 1 0 0
1 1 0 1 1 1
2 0 0 0 1 1
Use this to create a new dataframe using repeat
and array multiplication -
v = x.mul(x.columns).values
c = np.arange(1, x.shape[1] + 1)
df = pd.DataFrame(v, columns=c).add_prefix('VLUE')
df
VLUE1 VLUE2 VLUE3 VLUE4 VLUE5
0 A B C
1 A C D E
2 D E
get_dummies
is the fastest as of I know, here's my try with value_counts
and masking
ie
mask = df[0].str.split(',',expand=True).apply(pd.value_counts,1).notna()
pd.DataFrame(np.where(mask,mask.columns,'')).add_prefix('VALU')
VALU0 VALU1 VALU2 VALU3 VALU4
0 A B C
1 A C D E
2 D E
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.