applying a (row) function to a DataFrame changes column types

Question

Having an issue with unintended changes to column types, distilled as shown below. Column x is floats, column icol is ints. when the testfunction (which does nothing) is applied, column icol is changed to type float64, as demonstrated by this code:

df = pd.DataFrame({'x':[1000, -1000, 1.0]})       
df['icol'] = 1
print(df.dtypes)

def testfunction(r):
    pass
    return(r)
df = df.apply(testfunction, axis='columns')
print(df.dtypes)

However, if I make both the x and icol columns ints, then the types do not get changed.

df = pd.DataFrame({'x':[1000, -1000]})       
df['icol'] = 1
print(df.dtypes)

def testfunction(r):
    pass
    return(r)
df = df.apply(testfunction, axis='columns')
print(df.dtypes)

This is a potential hazard, for example if one may use an int column as a key later, etc.

Is this a feature, or am I doing something wrong here ? running python 3.7.3 on ubuntu

Thanks

Answer 1

All Pandas operations try to be as numerically efficient as possible. When applying an operation to a row, Pandas tries to construct a Series from the row first. If the row is a mix of ints and floats, these will be converted to floats, just like when you pass a mixed list to the Series constructor: Series([1000.0, 1]) is converted to all floats: ie Series([1000.0, 1.0])

Consequentially, if your row contains a string, the object dtype is used and all of the types are preserved at the cost of performance. In general, you should avoid apply if at all possible and use other Pandas methods to get the results.

df = pd.DataFrame({'x':[1000, -1000, 1.0]})
df['y'] = 1
df['z'] = 'hello'

print(df.apply(testfunction, axis='columns').dtypes)
# prints:
x    float64
y      int64
z     object
dtype: object

Answer 2

Thanks for the informative answer and comments. Here is another simple work-around, for anyone else who doesn't want to repent from using the row function pattern:

df = pd.DataFrame({'x':[1000, -1000.1]})       
df['icol'] = 1
print(df.dtypes)

def testfunction(r):
    pass
    return(r)

# save the types    
types = df.dtypes

df = df.apply(testfunction, axis='columns')
print(df.dtypes)

# put 'em back
df = df.astype(types.to_dict(), copy=False)

print(df.dtypes)

applying a (row) function to a DataFrame changes column types

Question

2 answers

solution1
6 ACCPTED 2020-01-23 21:56:28

solution2
0 2020-01-24 14:42:53

applying a (row) function to a DataFrame changes column types

Question

2 answers

solution1 6 ACCPTED 2020-01-23 21:56:28

solution2 0 2020-01-24 14:42:53

solution1
6 ACCPTED 2020-01-23 21:56:28

solution2
0 2020-01-24 14:42:53