I have a table with a column reporting a tag for each line. How can I create a column for each tag and add a boolean value to each column containing the tag?
This my input table in the file input.csv
COL1 COL2 COL3 TAG
12 13 21 a
15 23 31 b
32 33 31 a
15 53 31 a
18 26 31 c
17 63 31 d
12 25 31 a
1 93 31 a
13 25 31 a
and this is what I am aiming to obtain
COL1 COL2 COL3 a b c d ...
12 13 21 1 0 0 0
15 23 31 0 1 0 0
32 33 31 1 0 0 0
15 53 31 1 0 0 0
18 26 31 0 0 1 0
17 63 31 1 0 0 1
12 25 31 1 0 0 0
1 93 31 1 0 0 0
13 25 31 1 0 0 0
I tried to use pandas without success... Here is the piece of code I wrote
import pandas
column_to_replicate='tag'
df = pandas.read_csv("data.csv")
col_names = df[column_to_replicate].dropna().unique().tolist()
df[col_names] = pd.get_dummies(df[column_to_replicate])
@ anky_91的答案有效!
df=df.join(df.pop('TAG').str.get_dummies())
What you're looking for is called one-hot -encoding. You can use the function get_dummies
for a corresponding result:
import pandas as pd
one_hot_encoded = pd.get_dummies(df['TAG'])
one_hot_encoded.head()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.