How to split multiple values from a dataframe column into separate columns

Question

I have a column with multiple values. I want to split the unique values into multiple columns with headers and then apply Label Encoder or One Hot Encoder(I don't know yet) because I have a Multi-label text classification problem to solve.

I try

df['labels1'] = df['labels1'].str.split(',', expand=True)

but it splits only the first item. Also before try to split the column I try to change the type but I didn't make it.

id
0           Politics, Journals, International
1                  Social, Blogs, Celebrities
2                         Media, Blogs, Video
3                         Food&Drink, Cooking
4                         Media, Blogs, Video
5                                     Culture
6                            Social, TV Shows
7                       News, Crime, National
8                  Social, Blogs, Celebrities
9                  Social, Blogs, Celebrities
10                 Social, Blogs, Celebrities
11                              Family, Blogs
12                        Media, Blogs, Video
13                           Social, TV Shows
14                    Entertainment, TV Shows
15                           Social, TV Shows
16                 Social, Blogs, Celebrities

Answer 1

It seems like for the right side of the equation of df['labels1'].str.split(',', expand=True) would spit out two items. So perhaps you can do something like:

df['newcolumn1'], df['newcolumn2'] = df['labels1'].str.split(',', expand=True)

Answer 2

You try to set a column of a dataframe with a three-columns-dataframe - which unfortunately silently is done by passing only the first column...
Perhaps you try to concatenate the new three expanded columns to the first dataframe

df = pd.concat([df, df['labels1'].str.split(', ', expand=True)], 1)

or perhaps just go on with this step in a new one

df_exp = df['labels1'].str.split(', ', expand=True)

Edit:

IIUC, your binary table can be created like this (but I don't know if this is the recommended way to do):

col_head = set(df.labels1.str.split(', ', expand=True).values.flatten())

bin_tbl = pd.DataFrame(columns=col_head)

for c in bin_tbl:
    bin_tbl[c] = df.labels1.str.split(', ').apply(lambda x: c in x)

How to split multiple values from a dataframe column into separate columns

Question

2 answers

solution1
0 2019-06-02 21:52:07

solution2
0 2019-06-02 22:22:34

How to split multiple values from a dataframe column into separate columns

Question

2 answers

solution1 0 2019-06-02 21:52:07

solution2 0 2019-06-02 22:22:34

solution1
0 2019-06-02 21:52:07

solution2
0 2019-06-02 22:22:34