I have a column with multiple values. I want to split the unique values into multiple columns with headers and then apply Label Encoder or One Hot Encoder(I don't know yet) because I have a Multi-label text classification problem to solve.
I try
df['labels1'] = df['labels1'].str.split(',', expand=True)
but it splits only the first item. Also before try to split the column I try to change the type but I didn't make it.
id
0 Politics, Journals, International
1 Social, Blogs, Celebrities
2 Media, Blogs, Video
3 Food&Drink, Cooking
4 Media, Blogs, Video
5 Culture
6 Social, TV Shows
7 News, Crime, National
8 Social, Blogs, Celebrities
9 Social, Blogs, Celebrities
10 Social, Blogs, Celebrities
11 Family, Blogs
12 Media, Blogs, Video
13 Social, TV Shows
14 Entertainment, TV Shows
15 Social, TV Shows
16 Social, Blogs, Celebrities
It seems like for the right side of the equation of df['labels1'].str.split(',', expand=True) would spit out two items. So perhaps you can do something like:
df['newcolumn1'], df['newcolumn2'] = df['labels1'].str.split(',', expand=True)
You try to set a column of a dataframe with a three-columns-dataframe - which unfortunately silently is done by passing only the first column...
Perhaps you try to concatenate the new three expanded columns to the first dataframe
df = pd.concat([df, df['labels1'].str.split(', ', expand=True)], 1)
or perhaps just go on with this step in a new one
df_exp = df['labels1'].str.split(', ', expand=True)
Edit:
IIUC, your binary table can be created like this (but I don't know if this is the recommended way to do):
col_head = set(df.labels1.str.split(', ', expand=True).values.flatten())
bin_tbl = pd.DataFrame(columns=col_head)
for c in bin_tbl:
bin_tbl[c] = df.labels1.str.split(', ').apply(lambda x: c in x)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.