Python pandas - detect and convert numpy.ndarray columns to list columns

Question

We have the following dtypes in our pandas dataframe:

>>> results_df.dtypes
_id                              int64
playerId                         int64
leagueId                         int64
firstName                       object
lastName                        object
fullName                        object
shortName                       object
gender                          object
nickName                        object
height                         float64
jerseyNum                       object
position                        object
teamId                           int64
updated            datetime64[ns, UTC]
teamMarket                      object
conferenceId                     int64
teamName                        object
updatedDate                     object
competitionIds                  object
dtype: object

The object types are not helpful in the .dtypes output here since some columns are ordinary strings (eg. firstName , lastName ), whereas other columns are more complex ( competitionIds is an numpy.ndarray of int64s).

We'd like to convert competitionIds , and any other columns that are numpy.ndarray columns, into list columns, without explicitly passing competitionIds , since it's not always known which columns are the numpy.ndarray columns. So, even though this works: results_df['competitionIds'] = results_df['competitionIds'].apply(list) , it doesn't entirely solve the problem because I'm explicitly passing competitionIds here, whereas we need to automatically detect which columns are the numpy.ndarray columns.

Answer 1

Pandas treats just about anything that isn't an int, float or category as an "object" (including list s.): So the best way to go about this is to look at the type of an actual element of the column:

import pandas as pd
import numpy as np

df = pd.DataFrame([{'str': 'a', 'arr': np.random.randint(0, 4, (4))} for _ in range(3)])

df.apply(lambda c: list(c) if isinstance(c[0], np.ndarray)  else c)

This will prevent you from converting other types that you may want to keep in place (eg sets) as well.

Answer 2

Here is a toy example of what I'm thinking:

import numpy as np

data = {'col1':np.nan, 'col2':np.ndarray(0)}

for col in data:
    print(isinstance(data[col],np.ndarray))

resulting in:

#False
#True

Python pandas - detect and convert numpy.ndarray columns to list columns

Question

2 answers

solution1
2 ACCPTED 2020-11-28 20:09:09

solution2
1 2020-11-28 20:02:54

Python pandas - detect and convert numpy.ndarray columns to list columns

Question

2 answers

solution1 2 ACCPTED 2020-11-28 20:09:09

solution2 1 2020-11-28 20:02:54

solution1
2 ACCPTED 2020-11-28 20:09:09

solution2
1 2020-11-28 20:02:54