简体   繁体   中英

Mixed types of elements in DataFrame's column

Consider the following three DataFrame 's:

df1 = pd.DataFrame([[1,2],[4,3]])
df2 = pd.DataFrame([[1,.2],[4,3]])
df3 = pd.DataFrame([[1,'a'],[4,3]])

Here are the types of the second column of the DataFrame 's:

In [56]: map(type,df1[1])
Out[56]: [numpy.int64, numpy.int64]

In [57]: map(type,df2[1])
Out[57]: [numpy.float64, numpy.float64]

In [58]: map(type,df3[1])
Out[58]: [str, int]

In the first case, all int 's are casted to numpy.int64 . Fine. In the third case, there is basically no casting. However, in the second case, the integer ( 3 ) is casted to numpy.float64 ; probably since the other number is a float.

How can I control the casting? In the second case, I want to have either [float64, int64] or [float, int] as types.

Workaround:

Using a callable printing function there can be a workaround as showed here .

def printFloat(x):
    if np.modf(x)[0] == 0:
        return str(int(x))
    else:
        return str(x)
pd.options.display.float_format = printFloat

The columns of a pandas DataFrame (or a Series) are homogeneously of type. You can inspect this with dtype (or DataFrame.dtypes ):

In [14]: df1[1].dtype
Out[14]: dtype('int64')

In [15]: df2[1].dtype
Out[15]: dtype('float64')

In [16]: df3[1].dtype
Out[16]: dtype('O')

Only the generic 'object' dtype can hold any python object, and in this way can also contain mixed types:

In [18]: df2 = pd.DataFrame([[1,.2],[4,3]], dtype='object')

In [19]: df2[1].dtype
Out[19]: dtype('O')

In [20]: map(type,df2[1])
Out[20]: [float, int]

But this is really not recommended, as this defeats the purpose (or at least the performance) of pandas.

Is there a reason you specifically want both ints and floats in the same column?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM