如何在 Python 中存储 Pandas 的 get_dummies 转换？

Question

There is the get_dummies transformation in the pandas package in python, which transforms categorical variables to binary (flag) variables with values 0 / 1. This transformation is based on the actual values, but I'd like to store the code of the transformation, so that I can run it on other datasets, with less values, and still get the full-sized transformed data structure. python中的pandas包中有get_dummies转换，它将分类变量转换为值为0 / 1的二进制（标志）变量。这种转换基于实际值，但我想存储转换的代码，这样我就可以在其他数据集上运行它，使用较少的值，并且仍然可以获得完整大小的转换数据结构。

Say you have this code:假设你有这个代码：

import pandas as pd
a = [[5,12,"blue"], [8,53,"yellow"]]
df = pd.DataFrame(a, columns=['Weight','Size','Color'])
df.apply(pd.to_numeric, errors='ignore')
df

Producing this data:产生这些数据：

Weight  Size    Color
5       12      blue
8       53      yellow

and:和：

df = pd.get_dummies(df)
df

produces this:产生这个：

Weight  Size    Color_blue  Color_yellow
5       12      1           0
8       53      0           1

I'd like to store this original transformation, so that if I get a record later, like:我想存储这个原始转换，以便以后获得记录，例如：

[2,9,"blue"]

I can still get the whole structure, like:我仍然可以获得整个结构，例如：

Weight  Size    Color_blue  Color_yellow
2       9       1           0

Get_dummies will omit the Color_yellow column in the latter case...在后一种情况下，Get_dummies 将省略 Color_yellow 列...
What is the simplest solution to it?什么是最简单的解决方案？

I was thinking of something like building my own get_dummies function, which goes through all the categorical variables, gets all their possible distinct values, and then produces the code of the python function, which does the transformation.我正在考虑构建我自己的 get_dummies 函数，该函数遍历所有分类变量，获取所有可能的不同值，然后生成执行转换的 python 函数的代码。 But there must be some already implemented solution to it...但是必须有一些已经实施的解决方案......

Answer 1

This is what I was looking for.这就是我一直在寻找的。 The code prints the transformations, which has to be done on later datasets:代码打印转换，这必须在以后的数据集上完成：

import pandas as pd
import numpy as np
a = [[5,12,"blue","apple"], [8,53,"yellow","pear"], [1,8,"brown","peach"],[1,2,"blue","plum"]]
df = pd.DataFrame(a, columns=['Weight','Size','Color','Fruit'])
df.apply(pd.to_numeric, errors='ignore')

for col in df.select_dtypes(include=["object"]).columns:
    for i in df[col].unique():
       df[col+"_"+i] = np.where(df[col] == i, 1, 0)
       print('df["'+col+'_'+i+'"] = np.where(df["'+col+'"] == "'+i+'", 1, 0)')
    df = df.drop(columns=[col])
    print('df = df.drop(columns=["'+col+'"])')

如何在 Python 中存储 Pandas 的 get_dummies 转换？

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-02-26 09:23:26

如何在 Python 中存储 Pandas 的 get_dummies 转换？

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-02-26 09:23:26

解决方案1
1 已采纳 2018-02-26 09:23:26