简体   繁体   中英

Create a column with boolean values for each label of a column

I have a table with a column reporting a tag for each line. How can I create a column for each tag and add a boolean value to each column containing the tag?

This my input table in the file input.csv

COL1 COL2 COL3 TAG
12    13   21   a
15    23   31   b
32    33   31   a
15    53   31   a
18    26   31   c
17    63   31   d
12    25   31   a
1     93   31   a
13    25   31   a

and this is what I am aiming to obtain

COL1 COL2 COL3  a  b  c  d  ...
12    13   21   1  0  0  0
15    23   31   0  1  0  0 
32    33   31   1  0  0  0
15    53   31   1  0  0  0
18    26   31   0  0  1  0
17    63   31   1  0  0  1
12    25   31   1  0  0  0
1     93   31   1  0  0  0
13    25   31   1  0  0  0

I tried to use pandas without success... Here is the piece of code I wrote

import pandas

column_to_replicate='tag'

df = pandas.read_csv("data.csv")
col_names = df[column_to_replicate].dropna().unique().tolist()
df[col_names] = pd.get_dummies(df[column_to_replicate])

@ anky_91的答案有效!

df=df.join(df.pop('TAG').str.get_dummies())

What you're looking for is called one-hot -encoding. You can use the function get_dummies for a corresponding result:

import pandas as pd
one_hot_encoded = pd.get_dummies(df['TAG'])
one_hot_encoded.head()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM