简体   繁体   中英

Python pandas - detect and convert numpy.ndarray columns to list columns

We have the following dtypes in our pandas dataframe:

>>> results_df.dtypes
_id                              int64
playerId                         int64
leagueId                         int64
firstName                       object
lastName                        object
fullName                        object
shortName                       object
gender                          object
nickName                        object
height                         float64
jerseyNum                       object
position                        object
teamId                           int64
updated            datetime64[ns, UTC]
teamMarket                      object
conferenceId                     int64
teamName                        object
updatedDate                     object
competitionIds                  object
dtype: object

The object types are not helpful in the .dtypes output here since some columns are ordinary strings (eg. firstName , lastName ), whereas other columns are more complex ( competitionIds is an numpy.ndarray of int64s).

We'd like to convert competitionIds , and any other columns that are numpy.ndarray columns, into list columns, without explicitly passing competitionIds , since it's not always known which columns are the numpy.ndarray columns. So, even though this works: results_df['competitionIds'] = results_df['competitionIds'].apply(list) , it doesn't entirely solve the problem because I'm explicitly passing competitionIds here, whereas we need to automatically detect which columns are the numpy.ndarray columns.

Pandas treats just about anything that isn't an int, float or category as an "object" (including list s.): So the best way to go about this is to look at the type of an actual element of the column:

import pandas as pd
import numpy as np

df = pd.DataFrame([{'str': 'a', 'arr': np.random.randint(0, 4, (4))} for _ in range(3)])

df.apply(lambda c: list(c) if isinstance(c[0], np.ndarray)  else c)

This will prevent you from converting other types that you may want to keep in place (eg sets) as well.

Here is a toy example of what I'm thinking:

import numpy as np

data = {'col1':np.nan, 'col2':np.ndarray(0)}

for col in data:
    print(isinstance(data[col],np.ndarray))

resulting in:

#False
#True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM