简体   繁体   中英

How do I use df.astype() inside apply function

I have a data frame in which all the data in columns are of type object. Now I want to convert all objects into numeric types using astype() function but I don't want to do something like this ->

df.astype({'col1': 'int32', 'col2': 'int32'....})

If I do something like this ->

在此处输入图像描述

I get an error because apply function needs Series to traverse.

PS: The other option of doing the same thing is ->

df.apply(pd.to_numeric)

But I want to do this using.astype() Is there any other way instead of using df.apply() and still convert all object type data into numeric using df.astype()

Use df = df.astype(int) to convert all columns to int datatype

import numpy

df.astype(numpy.int32)

If these are object columns and you're certain they can be "soft-casted" to int, you have two options:

df
  worker day    tasks
0      A   2     read
1      A   9    write
2      B   1     read
3      B   2    write
4      B   4  execute

df.dtypes

worker    object
day       object
tasks     object
dtype: object

pandas <= 0.25

infer_objects (0.21+ only) casts your data to numpy types if possible.

df.infer_objects().dtypes

worker    object
day        int64
tasks     object
dtype: object

pandas >= 1.0

convert_dtypes casts your data to the most specific pandas extension dtype if possible.

df.convert_dtypes().dtypes

worker    string
day        Int64
tasks     string
dtype: object

Also see this answer by me for more information on "hard" versus "soft" conversions.

In my opinion the safest is to use pd.to_numeric in your apply function which also allows you error manipulation, coerce , raise or ignore . After getting the columns to numeric, then you can safely perform your astype() operation, but I wouldn't suggest it to begin with:

df.apply(pd.to_numeric, errors='ignore')

If the column can't be converted to numeric, it will remain unchanged

df.apply(pd.to_numeric, errors='coerce')

The columns will be converted to numeric, the values that can't be converted to numeric in the column will be replaced with NaN .

df.apply(pd.to_numeric, errors='raise')

ValueError will be returned if the column can't be converted to numeric

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM