[英]Python pandas - detect and convert numpy.ndarray columns to list columns
We have the following dtypes in our pandas dataframe:我们的 pandas dataframe 中有以下 dtypes:
>>> results_df.dtypes
_id int64
playerId int64
leagueId int64
firstName object
lastName object
fullName object
shortName object
gender object
nickName object
height float64
jerseyNum object
position object
teamId int64
updated datetime64[ns, UTC]
teamMarket object
conferenceId int64
teamName object
updatedDate object
competitionIds object
dtype: object
The object
types are not helpful in the .dtypes
output here since some columns are ordinary strings (eg. firstName
, lastName
), whereas other columns are more complex ( competitionIds
is an numpy.ndarray of int64s). object
类型在此处的.dtypes
output 中没有帮助,因为某些列是普通字符串(例如firstName
, lastName
),而其他列更复杂( competitionIds
是 Z2EA9510C37Ft6249E4941ndFF7 的数组)。
We'd like to convert competitionIds
, and any other columns that are numpy.ndarray
columns, into list columns, without explicitly passing competitionIds
, since it's not always known which columns are the numpy.ndarray
columns.我们希望将
numpy.ndarray
competitionIds
转换为列表列,而不显式传递numpy.ndarray
competitionIds
。 So, even though this works: results_df['competitionIds'] = results_df['competitionIds'].apply(list)
, it doesn't entirely solve the problem because I'm explicitly passing competitionIds
here, whereas we need to automatically detect which columns are the numpy.ndarray
columns.因此,即使这样有效:
results_df['competitionIds'] = results_df['competitionIds'].apply(list)
,它并不能完全解决问题,因为我在这里明确传递了competitionIds
ID,而我们需要自动检测哪个列是numpy.ndarray
列。
Pandas treats just about anything that isn't an int, float or category as an "object" (including list
s.): So the best way to go about this is to look at the type of an actual element of the column: Pandas 将几乎所有不是 int、float 或类别的东西都视为“对象”(包括
list
s.):因此,关于此问题的 go 的最佳方法是查看列的实际元素的类型:
import pandas as pd
import numpy as np
df = pd.DataFrame([{'str': 'a', 'arr': np.random.randint(0, 4, (4))} for _ in range(3)])
df.apply(lambda c: list(c) if isinstance(c[0], np.ndarray) else c)
This will prevent you from converting other types that you may want to keep in place (eg sets) as well.这将阻止您转换您可能想要保留的其他类型(例如集合)。
Here is a toy example of what I'm thinking:这是我在想的一个玩具示例:
import numpy as np
data = {'col1':np.nan, 'col2':np.ndarray(0)}
for col in data:
print(isinstance(data[col],np.ndarray))
resulting in:导致:
#False
#True
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.