简体   繁体   中英

Pandas pivot changing dtype float to object

I have a DataFrame that looks like this:

      n_trigger      device_name  Collected charge (V s)  accepted
0             0  Speedy Gonzalez            2.913136e-12      True
1             0               #6            5.530943e-12      True
2             1  Speedy Gonzalez            1.530740e-11      True
3             1               #6            4.784455e-11      True
4             2  Speedy Gonzalez            6.736956e-12      True
...         ...              ...                     ...       ...
9507       5552               #6            1.155196e-11      True
9508       5553  Speedy Gonzalez            3.378050e-12      True
9509       5553               #6            9.158863e-12      True
9510       5554  Speedy Gonzalez            3.723929e-12      True
9511       5554               #6            1.401557e-11      True

and I also have this function

def resample_measured_data(measured_data_df):
    resampled_df = measured_data_df.pivot(
        index = 'n_trigger',
        columns = 'device_name',
        values = set(measured_data_df.columns) - {'n_trigger','device_name'},
    )
    resampled_df = resampled_df.sample(frac=1, replace=True)
    resampled_df = resampled_df.stack()
    resampled_df = resampled_df.reset_index()
    return resampled_df

For some reason the Collected charge (V s) column is being changed from float64 to object . I found that pivot changes from int to float which is reasonable to handle NaN values. But why is it here changing from float64 to object ?

I think this is because you used 2 columns which are of different dtypes as "values" in pivot .

Let's look at a simple example df :

   a  b  c      d
0  1  2  1   True
1  2  2  0  False

>>> df.pivot('a','b',['c','d']).dtypes
   b
c  2    object
d  2    object
dtype: object

this happens because c is dtype int and d is dtype bool. Now if we change dtype of c into bool and check dtypes:

>>> df['c'] = df['c'].astype(bool)
>>> df.pivot('a','b',['c','d']).dtypes
   b
c  2    bool
d  2    bool
dtype: object

we get bool as expected. Same happens if we change the dtype of d to float or int, we'll get the expected dtypes.

Back to your data, if we change the dtype of "accepted" column to numeric and then pivot :

resampled_df = measured_data_df.assign(accepted=measured_data_df['accepted'].astype(int)).pivot(
        index = 'n_trigger',
        columns = 'device_name',
        values = set(measured_data_df.columns) - {'n_trigger','device_name'},
    )

>>> resampled_df.dtypes

                        device_name    
accepted                #6                 float64
                        Speedy Gonzalez    float64
Collected charge (V s)  #6                 float64
                        Speedy Gonzalez    float64
dtype: object

we get the expected dtypes.

Finally, if we fill with 0, the float dtype columns revert back to their original dtype:

>>> resampled_df.fillna(0).dtypes

                        device_name    
Collected charge (V s)  #6                 float64
                        Speedy Gonzalez    float64
accepted                #6                  object
                        Speedy Gonzalez     object
dtype: object

It turns out, you can directly turn them into dtype float objects as well using astype(float) :

>>> s = resampled_df['Collected charge (V s)'].astype(float)
>>> s.dtypes

device_name
#6                 float64
Speedy Gonzalez    float64
dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM