简体   繁体   English

如何在python中如下转换数据?

[英]How to convert the data as following in python?

I have some data in the following format in a csv file. 我在csv文件中有以下格式的数据。

   Id   Category
    1   A
    2   B
    3   C
    4   B
    5   C
    6   d

I'd like to convert it into the below format and save it another csv file 我想将其转换为以下格式并保存为另一个csv文件

Id  A   B   C   D   E
1   1   0   0   0   0
2   0   1   0   0   0
3   0   0   1   0   0
4   0   1   0   0   0
5   0   0   1   0   0
6   0   0   0   1   0

Try with pd.get_dummies() 尝试使用pd.get_dummies()

>> df = pd.read_csv(<path_to_file>, sep=',', encoding='utf-8', header=0)

>> df
   Id   Category
0   1          A
1   2          B
2   3          C
3   4          B
4   5          C
5   6          d

>> pd.get_dummies(df.Category)

This will encode Category and give you new columns: 这将对Category进行编码,并为您提供新的列:

A B C d

But will not 'fix' d -> D and will not give you any columns that can not be deduced from the values you have in Category . 但是不会'修复'd-> D,也不会给您无法从Category的值推导出的任何列。

I suggest you check the solution posted in the comment earlier for that. 我建议您为此检查在评论中发布的解决方案。

EDIT 编辑

# Load data from .CSV with pd.read_csv() as demonstrated above

In [13]: df
Out[13]: 
  Category  Id
0        A   1
1        B   2
2        C   3
3        B   4
4        C   5
5        D   6

## One-liner for hot-encoding, then concatenating to original dataframe 
## and finally dropping the old column 'Category', you can skip the 
## last part if you want to keep original column as well.
In [14]: df = pd.concat([df, pd.get_dummies(df.Category)], axis=1).drop('Category', axis=1)

In [15]: df
Out[15]: 
   Id    A    B    C    D
0   1  1.0  0.0  0.0  0.0
1   2  0.0  1.0  0.0  0.0
2   3  0.0  0.0  1.0  0.0
3   4  0.0  1.0  0.0  0.0
4   5  0.0  0.0  1.0  0.0
5   6  0.0  0.0  0.0  1.0

## Write to file
In [16]: df.to_csv(<output_path>, sep='\t', encoding='utf-8', index=None)

As you can see this is not the Transpose, only the result of the hot-encoding of the Category column is added to each row. 如您所见,这不是“转置”,仅将Category列的热编码结果添加到每一行。

Whether Excel accepts the final data or not, there's not much you can do with Pandas about this, unfortunately. 不管Excel是否接受最终数据,不幸的是,Pandas对此无能为力。

I hope this helps. 我希望这有帮助。

Use a pivot table (updated to include .csv read/write functionality): 使用数据透视表(已更新,包括.csv读/写功能):

import pandas as pd
path = 'the path to your file'
df = pd.read_csv(path)

# your original dataframe
# Category  Id
# 0        A   1
# 1        B   2
# 2        C   3
# 3        B   4
# 4        C   5
# 5        D   6

# pivot table
df.pivot_table(index=['Id'], columns='Category', fill_value=0, aggfunc='size')

# save to file
df.to_csv('path\filename.csv') #e.g. 'C:\\Users\\you\\Documents\\filename.csv'

OUTPUT: OUTPUT:

Category  A  B  C  D
Id                  
1         1  0  0  0
2         0  1  0  0
3         0  0  1  0
4         0  1  0  0
5         0  0  1  0
6         0  0  0  1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM