Python pandas - 检测并将 numpy.ndarray 列转换为列表列

Question

We have the following dtypes in our pandas dataframe:我们的 pandas dataframe 中有以下 dtypes：

>>> results_df.dtypes
_id                              int64
playerId                         int64
leagueId                         int64
firstName                       object
lastName                        object
fullName                        object
shortName                       object
gender                          object
nickName                        object
height                         float64
jerseyNum                       object
position                        object
teamId                           int64
updated            datetime64[ns, UTC]
teamMarket                      object
conferenceId                     int64
teamName                        object
updatedDate                     object
competitionIds                  object
dtype: object

The object types are not helpful in the .dtypes output here since some columns are ordinary strings (eg. firstName , lastName ), whereas other columns are more complex ( competitionIds is an numpy.ndarray of int64s). object类型在此处的.dtypes output 中没有帮助，因为某些列是普通字符串（例如firstName ， lastName ），而其他列更复杂（ competitionIds是 Z2EA9510C37Ft6249E4941ndFF7 的数组）。

We'd like to convert competitionIds , and any other columns that are numpy.ndarray columns, into list columns, without explicitly passing competitionIds , since it's not always known which columns are the numpy.ndarray columns.我们希望将numpy.ndarray competitionIds转换为列表列，而不显式传递numpy.ndarray competitionIds 。 So, even though this works: results_df['competitionIds'] = results_df['competitionIds'].apply(list) , it doesn't entirely solve the problem because I'm explicitly passing competitionIds here, whereas we need to automatically detect which columns are the numpy.ndarray columns.因此，即使这样有效： results_df['competitionIds'] = results_df['competitionIds'].apply(list) ，它并不能完全解决问题，因为我在这里明确传递了competitionIds ID，而我们需要自动检测哪个列是numpy.ndarray列。

Answer 1

Pandas treats just about anything that isn't an int, float or category as an "object" (including list s.): So the best way to go about this is to look at the type of an actual element of the column: Pandas 将几乎所有不是 int、float 或类别的东西都视为“对象”（包括list s.）：因此，关于此问题的 go 的最佳方法是查看列的实际元素的类型：

import pandas as pd
import numpy as np

df = pd.DataFrame([{'str': 'a', 'arr': np.random.randint(0, 4, (4))} for _ in range(3)])

df.apply(lambda c: list(c) if isinstance(c[0], np.ndarray)  else c)

This will prevent you from converting other types that you may want to keep in place (eg sets) as well.这将阻止您转换您可能想要保留的其他类型（例如集合）。

Answer 2

Here is a toy example of what I'm thinking:这是我在想的一个玩具示例：

import numpy as np

data = {'col1':np.nan, 'col2':np.ndarray(0)}

for col in data:
    print(isinstance(data[col],np.ndarray))

resulting in:导致：

#False
#True

Python pandas - 检测并将 numpy.ndarray 列转换为列表列

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-11-28 20:09:09

解决方案2
1 2020-11-28 20:02:54

Python pandas - 检测并将 numpy.ndarray 列转换为列表列

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-11-28 20:09:09

解决方案2 1 2020-11-28 20:02:54

解决方案1
2 已采纳 2020-11-28 20:09:09

解决方案2
1 2020-11-28 20:02:54