We have the following dtypes in our pandas dataframe:
>>> results_df.dtypes
_id int64
playerId int64
leagueId int64
firstName object
lastName object
fullName object
shortName object
gender object
nickName object
height float64
jerseyNum object
position object
teamId int64
updated datetime64[ns, UTC]
teamMarket object
conferenceId int64
teamName object
updatedDate object
competitionIds object
dtype: object
The object
types are not helpful in the .dtypes
output here since some columns are ordinary strings (eg. firstName
, lastName
), whereas other columns are more complex ( competitionIds
is an numpy.ndarray of int64s).
We'd like to convert competitionIds
, and any other columns that are numpy.ndarray
columns, into list columns, without explicitly passing competitionIds
, since it's not always known which columns are the numpy.ndarray
columns. So, even though this works: results_df['competitionIds'] = results_df['competitionIds'].apply(list)
, it doesn't entirely solve the problem because I'm explicitly passing competitionIds
here, whereas we need to automatically detect which columns are the numpy.ndarray
columns.
Pandas treats just about anything that isn't an int, float or category as an "object" (including list
s.): So the best way to go about this is to look at the type of an actual element of the column:
import pandas as pd
import numpy as np
df = pd.DataFrame([{'str': 'a', 'arr': np.random.randint(0, 4, (4))} for _ in range(3)])
df.apply(lambda c: list(c) if isinstance(c[0], np.ndarray) else c)
This will prevent you from converting other types that you may want to keep in place (eg sets) as well.
Here is a toy example of what I'm thinking:
import numpy as np
data = {'col1':np.nan, 'col2':np.ndarray(0)}
for col in data:
print(isinstance(data[col],np.ndarray))
resulting in:
#False
#True
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.