简体   繁体   English

从 CSV 转换为 Python DataFrame 中的分类数据

[英]Convert to Categorical Data in Python DataFrame from a CSV

I have Insurance data that have a Colum Called 'Region' in which the regions are specified.我有保险数据,其中有一个名为“区域”的列,其中指定了区域。 For the Data Visualization Purpose, I need to change those alphabetical values to numerical values ie, South Should change to 1. If the region is only 1 or 2 I can change it but as the Region count is larger it is not possible.出于数据可视化的目的,我需要将这些字母值更改为数值,即 South 应该更改为 1。如果区域只有 1 或 2,我可以更改它,但由于区域计数较大,这是不可能的。 Is there any method to do this?有什么方法可以做到这一点吗? Follow is code that I do so far(Not sure If it's correct or not)以下是我到目前为止所做的代码(不确定它是否正确)

k = insurance.shape[0]
dict1={}       
for i in range(k):
    if insurance['region'][i] in dict1:
        print('Testing')
    else:
        dict1[i] = insurance['region'][i]
print(dict1)

What should be the code to resolve the above-mentioned problem?解决上述问题的代码应该是什么?

If you can use 3rd party libraries, you can leverage factorize .如果您可以使用 3rd 方库,则可以利用factorize Following the docs , here as an example with toy data:docs之后,这里以玩具数据为例:

import pandas as pd

df = pd.DataFrame({"region": ["b", "c", "d" , "a", "a"]})

df["region_as_num"], _ = pd.factorize(df["region"], sort=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM