简体   繁体   English

在熊猫中将字符串/数字数据转换为分类格式

[英]Converting string/numerical data to categorical format in pandas

I have a very large csv file that I have converted to a Pandas dataframe, which has string and integer/float values. 我有一个非常大的csv文件,已将其转换为Pandas数据帧,该数据帧具有字符串和整数/浮点值。 I would like to change this data to categorical format in order to try and save some memory. 我想将此数据更改为分类格式,以尝试节省一些内存。 I am basing this idea off of the documentation here: https://pandas.pydata.org/pandas-docs/version/0.20/categorical.html 我将这个想法基于以下文档: https : //pandas.pydata.org/pandas-docs/version/0.20/categorical.html

My dataframe looks like the following: 我的数据框如下所示:

    clean_data_measurements.head(20)

        station         date    prcp    tobs
    0   USC00519397 1/1/2010    0.08    65
    1   USC00519397 1/2/2010    0.00    63
    2   USC00519397 1/3/2010    0.00    74
    3   USC00519397 1/4/2010    0.00    76
    5   USC00519397 1/7/2010    0.06    70
    6   USC00519397 1/8/2010    0.00    64
    7   USC00519397 1/9/2010    0.00    68
    8   USC00519397 1/10/2010   0.00    73
    9   USC00519397 1/11/2010   0.01    64
    10  USC00519397 1/12/2010   0.00    61
    11  USC00519397 1/14/2010   0.00    66
    12  USC00519397 1/15/2010   0.00    65
    13  USC00519397 1/16/2010   0.00    68
    14  USC00519397 1/17/2010   0.00    64
    15  USC00519397 1/18/2010   0.00    72
    16  USC00519397 1/19/2010   0.00    66
    17  USC00519397 1/20/2010   0.00    66
    18  USC00519397 1/21/2010   0.00    69
    19  USC00519397 1/22/2010   0.00    67
    20  USC00519397 1/23/2010   0.00    67

It is precipitation data which goes on another 2700 rows. 这是降水量数据,另外还有2700行。 Since it is all of the same category (station number), it should be convertible to categorical format which will save processing time. 由于它们属于同一类别(站号),因此应将其转换为分类格式,这样可以节省处理时间。 I am just unsure of how to write the code. 我只是不确定如何编写代码。 Can anyone help? 有人可以帮忙吗? Thanks. 谢谢。

I think we can convert object to category data by using factorize 我认为我们可以通过使用factorize将对象转换为类别数据

objectdf=df.select_dtypes(include='object')

df.loc[:,objectdf.columns]=objectdf.apply(lambda x : pd.factorize(x)[0])
df
Out[452]: 
    station  date  prcp  tobs
0         0     0  0.08    65
1         0     1  0.00    63
2         0     2  0.00    74
3         0     3  0.00    76
5         0     4  0.06    70
6         0     5  0.00    64
7         0     6  0.00    68
8         0     7  0.00    73
9         0     8  0.01    64
10        0     9  0.00    61
11        0    10  0.00    66
12        0    11  0.00    65
13        0    12  0.00    68
14        0    13  0.00    64
15        0    14  0.00    72
16        0    15  0.00    66
17        0    16  0.00    66
18        0    17  0.00    69
19        0    18  0.00    67
20        0    19  0.00    67

You can try this as well. 您也可以尝试一下。

for y,x in zip(df.columns,df.dtypes):
    if x == 'object':
        df[y]=pd.factorize(df[y])[0]
    elif x=='int64':
        df[y]=df[y].astype(np.int8)
    else:
        df[y]=df[y].astype(np.float32)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 Pandas 中将数值转换为分类 - Converting Numerical Values to Categorical in Pandas 将带有 % 符号的分类变量转换为数值变量 Python Pandas - Converting Categorical Variable with % Sign to Numerical Variable Python Pandas 是否必须将分类数据转换为数值数据才能使用解释(Microsoft 包)? - Is it obligatory converting categorical data to numerical data to use interpret (Microsoft package)? pandas 替换命令无法将分类数据更改为数值数据 - pandas replace command unable to change categorical data to numerical data 如何在 python pandas 的 for 循环中将分类数据转换为数值数据 - how to convert categorical data to numerical data in for loop in python pandas 将numpy字符串字段数组转换为数字格式 - converting numpy array of string fields to numerical format Pandas不会将分类数据[性别]更改为数值[0/1] - Pandas does not change categorical data [sex] to numerical values [0/1] 如何在不增加数据大小的情况下将大熊猫中的分类变量转换为数值? - How to convert categorical variable to numerical in pandas without increasing size of data? 使用分类数据和数值数据绘制 pandas dataframe 的散点图 plot - Plotting scatter plot of pandas dataframe with both categorical and numerical data 将分类数据编码为数值 - Encoding categorical data to numerical
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM