简体   繁体   English

Python - 将数据中的代码映射到描述的最佳方法

[英]Python - best approach to mapping codes in data to description

I am getting the results I want but want to understand if this would be considered the best, or even a correct way of mapping data codes to descriptors.我得到了我想要的结果,但想了解这是否被认为是最好的,甚至是将数据代码映射到描述符的正确方法。

I have a dataset where many of the values are stored as numeric codes which represent some attribute - eg我有一个数据集,其中许多值存储为代表某些属性的数字代码 - 例如

Fruit_Type:
1 = Apple,
2 = Orange,
3 = Banana,
4 = Grape

In SAS, I would have used a Proc Format to map the numeric to the descriptor.在 SAS 中,我会使用 Proc 格式将数字映射到描述符。 In SQL I would typically use a case statement which would let me either keep the original field name or assign it a new name.在 SQL 中,我通常会使用 case 语句,它可以让我保留原始字段名称或为其分配一个新名称。

I am fairly new to Python and am curious what would be considered the best approach to this.我对 Python 相当陌生,并且很好奇什么被认为是最好的方法。 What I have been using - which seems to work fine is to create the mapping as a dictionary and then create a new column using the .apply function.我一直在使用的 - 似乎工作正常是将映射创建为字典,然后使用 .apply 函数创建一个新列。 This works but is it the right way to do this?这有效,但这是正确的方法吗?

import pandas as pd 
# Create sample dataframe  
data = {'Fruit_Type':[1, 2, 2, 3, 1, 2, 4], 
        'other_data':['blah', 'blah','blah', 'blah','blah', 'blah',
                      'blah']} 

df = pd.DataFrame(data) 

#create dictionary
Fruit_Type_dictionary = {1: 'Apple',
                        2: 'Orange',
                        3: 'Banana',
                        4: 'Grape'}

df['rpt_Fruit_Type']= df['Fruit_Type'].apply(lambda x: Fruit_Type_dictionary.get(x))

print(df) 

which yields:产生:

       Fruit_Type other_data rpt_Fruit_Type
0           1       blah          Apple
1           2       blah         Orange
2           2       blah         Orange
3           3       blah         Banana
4           1       blah          Apple
5           2       blah         Orange
6           4       blah          Grape

which pretty much gives me my desired results.这几乎给了我想要的结果。

我会使用系列地图方法来提高可读性:

df['rpt_Fruit_Type']= df['Fruit_Type'].map(Fruit_Type_dictionary)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM