I have some data in the following format in a csv file.
Id Category
1 A
2 B
3 C
4 B
5 C
6 d
I'd like to convert it into the below format and save it another csv file
Id A B C D E
1 1 0 0 0 0
2 0 1 0 0 0
3 0 0 1 0 0
4 0 1 0 0 0
5 0 0 1 0 0
6 0 0 0 1 0
Try with pd.get_dummies()
>> df = pd.read_csv(<path_to_file>, sep=',', encoding='utf-8', header=0)
>> df
Id Category
0 1 A
1 2 B
2 3 C
3 4 B
4 5 C
5 6 d
>> pd.get_dummies(df.Category)
This will encode Category
and give you new columns:
A B C d
But will not 'fix' d -> D and will not give you any columns that can not be deduced from the values you have in Category
.
I suggest you check the solution posted in the comment earlier for that.
EDIT
# Load data from .CSV with pd.read_csv() as demonstrated above
In [13]: df
Out[13]:
Category Id
0 A 1
1 B 2
2 C 3
3 B 4
4 C 5
5 D 6
## One-liner for hot-encoding, then concatenating to original dataframe
## and finally dropping the old column 'Category', you can skip the
## last part if you want to keep original column as well.
In [14]: df = pd.concat([df, pd.get_dummies(df.Category)], axis=1).drop('Category', axis=1)
In [15]: df
Out[15]:
Id A B C D
0 1 1.0 0.0 0.0 0.0
1 2 0.0 1.0 0.0 0.0
2 3 0.0 0.0 1.0 0.0
3 4 0.0 1.0 0.0 0.0
4 5 0.0 0.0 1.0 0.0
5 6 0.0 0.0 0.0 1.0
## Write to file
In [16]: df.to_csv(<output_path>, sep='\t', encoding='utf-8', index=None)
As you can see this is not the Transpose, only the result of the hot-encoding of the Category
column is added to each row.
Whether Excel accepts the final data or not, there's not much you can do with Pandas about this, unfortunately.
I hope this helps.
Use a pivot table (updated to include .csv read/write functionality):
import pandas as pd
path = 'the path to your file'
df = pd.read_csv(path)
# your original dataframe
# Category Id
# 0 A 1
# 1 B 2
# 2 C 3
# 3 B 4
# 4 C 5
# 5 D 6
# pivot table
df.pivot_table(index=['Id'], columns='Category', fill_value=0, aggfunc='size')
# save to file
df.to_csv('path\filename.csv') #e.g. 'C:\\Users\\you\\Documents\\filename.csv'
OUTPUT:
Category A B C D
Id
1 1 0 0 0
2 0 1 0 0
3 0 0 1 0
4 0 1 0 0
5 0 0 1 0
6 0 0 0 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.