简体   繁体   English

Python pandas - 检测并将 numpy.ndarray 列转换为列表列

[英]Python pandas - detect and convert numpy.ndarray columns to list columns

We have the following dtypes in our pandas dataframe:我们的 pandas dataframe 中有以下 dtypes:

>>> results_df.dtypes
_id                              int64
playerId                         int64
leagueId                         int64
firstName                       object
lastName                        object
fullName                        object
shortName                       object
gender                          object
nickName                        object
height                         float64
jerseyNum                       object
position                        object
teamId                           int64
updated            datetime64[ns, UTC]
teamMarket                      object
conferenceId                     int64
teamName                        object
updatedDate                     object
competitionIds                  object
dtype: object

The object types are not helpful in the .dtypes output here since some columns are ordinary strings (eg. firstName , lastName ), whereas other columns are more complex ( competitionIds is an numpy.ndarray of int64s). object类型在此处的.dtypes output 中没有帮助,因为某些列是普通字符串(例如firstNamelastName ),而其他列更复杂( competitionIds是 Z2EA9510C37Ft6249E4941ndFF7 的数组)。

We'd like to convert competitionIds , and any other columns that are numpy.ndarray columns, into list columns, without explicitly passing competitionIds , since it's not always known which columns are the numpy.ndarray columns.我们希望将numpy.ndarray competitionIds转换为列表列,而不显式传递numpy.ndarray competitionIds So, even though this works: results_df['competitionIds'] = results_df['competitionIds'].apply(list) , it doesn't entirely solve the problem because I'm explicitly passing competitionIds here, whereas we need to automatically detect which columns are the numpy.ndarray columns.因此,即使这样有效: results_df['competitionIds'] = results_df['competitionIds'].apply(list) ,它并不能完全解决问题,因为我在这里明确传递了competitionIds ID,而我们需要自动检测哪个列是numpy.ndarray列。

Pandas treats just about anything that isn't an int, float or category as an "object" (including list s.): So the best way to go about this is to look at the type of an actual element of the column: Pandas 将几乎所有不是 int、float 或类别的东西都视为“对象”(包括list s.):因此,关于此问题的 go 的最佳方法是查看列的实际元素的类型:

import pandas as pd
import numpy as np

df = pd.DataFrame([{'str': 'a', 'arr': np.random.randint(0, 4, (4))} for _ in range(3)])

df.apply(lambda c: list(c) if isinstance(c[0], np.ndarray)  else c)

This will prevent you from converting other types that you may want to keep in place (eg sets) as well.这将阻止您转换您可能想要保留的其他类型(例如集合)。

Here is a toy example of what I'm thinking:这是我在想的一个玩具示例:

import numpy as np

data = {'col1':np.nan, 'col2':np.ndarray(0)}

for col in data:
    print(isinstance(data[col],np.ndarray))

resulting in:导致:

#False
#True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM