简体   繁体   中英

How to split multiple values from a dataframe column into separate columns

I have a column with multiple values. I want to split the unique values into multiple columns with headers and then apply Label Encoder or One Hot Encoder(I don't know yet) because I have a Multi-label text classification problem to solve.

I try

df['labels1'] = df['labels1'].str.split(',', expand=True) 

but it splits only the first item. Also before try to split the column I try to change the type but I didn't make it.

id
0           Politics, Journals, International
1                  Social, Blogs, Celebrities
2                         Media, Blogs, Video
3                         Food&Drink, Cooking
4                         Media, Blogs, Video
5                                     Culture
6                            Social, TV Shows
7                       News, Crime, National
8                  Social, Blogs, Celebrities
9                  Social, Blogs, Celebrities
10                 Social, Blogs, Celebrities
11                              Family, Blogs
12                        Media, Blogs, Video
13                           Social, TV Shows
14                    Entertainment, TV Shows
15                           Social, TV Shows
16                 Social, Blogs, Celebrities

It seems like for the right side of the equation of df['labels1'].str.split(',', expand=True) would spit out two items. So perhaps you can do something like:

df['newcolumn1'], df['newcolumn2'] = df['labels1'].str.split(',', expand=True)

You try to set a column of a dataframe with a three-columns-dataframe - which unfortunately silently is done by passing only the first column...
Perhaps you try to concatenate the new three expanded columns to the first dataframe

df = pd.concat([df, df['labels1'].str.split(', ', expand=True)], 1)

or perhaps just go on with this step in a new one

df_exp = df['labels1'].str.split(', ', expand=True)

Edit:

IIUC, your binary table can be created like this (but I don't know if this is the recommended way to do):

col_head = set(df.labels1.str.split(', ', expand=True).values.flatten())

bin_tbl = pd.DataFrame(columns=col_head)

for c in bin_tbl:
    bin_tbl[c] = df.labels1.str.split(', ').apply(lambda x: c in x)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM