Having an issue with unintended changes to column types, distilled as shown below. Column x is floats, column icol is ints. when the testfunction (which does nothing) is applied, column icol is changed to type float64, as demonstrated by this code:
df = pd.DataFrame({'x':[1000, -1000, 1.0]})
df['icol'] = 1
print(df.dtypes)
def testfunction(r):
pass
return(r)
df = df.apply(testfunction, axis='columns')
print(df.dtypes)
However, if I make both the x and icol columns ints, then the types do not get changed.
df = pd.DataFrame({'x':[1000, -1000]})
df['icol'] = 1
print(df.dtypes)
def testfunction(r):
pass
return(r)
df = df.apply(testfunction, axis='columns')
print(df.dtypes)
This is a potential hazard, for example if one may use an int column as a key later, etc.
Is this a feature, or am I doing something wrong here ? running python 3.7.3 on ubuntu
Thanks
All Pandas operations try to be as numerically efficient as possible. When applying an operation to a row, Pandas tries to construct a Series
from the row first. If the row is a mix of ints and floats, these will be converted to floats, just like when you pass a mixed list to the Series
constructor: Series([1000.0, 1])
is converted to all floats: ie Series([1000.0, 1.0])
Consequentially, if your row contains a string, the object
dtype is used and all of the types are preserved at the cost of performance. In general, you should avoid apply
if at all possible and use other Pandas methods to get the results.
df = pd.DataFrame({'x':[1000, -1000, 1.0]})
df['y'] = 1
df['z'] = 'hello'
print(df.apply(testfunction, axis='columns').dtypes)
# prints:
x float64
y int64
z object
dtype: object
Thanks for the informative answer and comments. Here is another simple work-around, for anyone else who doesn't want to repent from using the row function pattern:
df = pd.DataFrame({'x':[1000, -1000.1]})
df['icol'] = 1
print(df.dtypes)
def testfunction(r):
pass
return(r)
# save the types
types = df.dtypes
df = df.apply(testfunction, axis='columns')
print(df.dtypes)
# put 'em back
df = df.astype(types.to_dict(), copy=False)
print(df.dtypes)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.