简体   繁体   English

如何将 Pandas 数据框转换为单个列表

[英]How to Convert Pandas Dataframe to Single List

Suppose I have a dataframe:假设我有一个数据框:

    col1    col2    col3
0    1       5       2
1    7       13
2    9       1
3            7

How do I convert to a single list such as:如何转换为单个列表,例如:

[1, 7, 9, 5, 13, 1, 7]

I have tried:我试过了:

df.values.tolist()

However this returns a list of lists rather than a single list:但是,这将返回列表列表而不是单个列表:

[[1.0, 5.0, 2.0], [7.0, 13.0, nan], [9.0, 1.0, nan], [nan, 7.0, nan]]

Note the dataframe will contain an unknown number of columns.请注意,数据框将包含未知数量的列。 The order of the values is not important so long as the list contains all values in the dataframe.只要列表包含数据框中的所有值,值的顺序并不重要。

I imagine I could write a function to unpack the values, however I'm wondering if there is a simple built-in way of converting a dataframe to a series/list?我想我可以编写一个函数来解压这些值,但是我想知道是否有一种简单的内置方法可以将数据帧转换为系列/列表?

Following your current approach, you can flatten your array before converting it to a list. 按照当前方法,可以先将数组变平,然后再将其转换为列表。 If you need to drop nan values, you can do that after flattening as well: 如果需要删除nan值,也可以在展平后执行以下操作:

arr = df.to_numpy().flatten()
list(arr[~np.isnan(arr)])

Also, future versions of Pandas seem to prefer to_numpy over values 另外,未来版本的Pandas似乎更喜欢to_numpy不是values


An alternate, perhaps cleaner, approach is to 'stack' your dataframe: 另一种可能更清洁的方法是“堆叠”数据框:

df.stack().tolist()

you can use dataframe stack 您可以使用数据框堆栈

In [12]: df = pd.DataFrame({"col1":[np.nan,3,4,np.nan], "col2":['test',np.nan,45,3]})

In [13]: df.stack().tolist()
Out[13]: ['test', 3.0, 4.0, 45, 3]
values=df.T.values.reshape(1,-1).squeeze()
values=values[~np.isnan(values)].tolist()
values

Output: 输出:

[1.0, 7.0, 9.0, 7.0, 5.0, 13.0, 1.0, 2.0] [1.0、7.0、9.0、7.0、5.0、13.0、1.0、2.0]

For Ordered list (As per problem statement): 对于有序列表(根据问题陈述):
Only if your data contains integer values: 仅当您的数据包含整数值时:

Firstly get all items in data frame and then remove the nan from the list. 首先获取数据框中的所有项目,然后从列表中删除nan

items = [item for sublist in [df[cols].tolist() for cols in df.columns] for item in sublist]
items = [int(x) for x in items if str(x) != 'nan']

For Un-Ordered list: 对于无序列表:
Only if your data contains integer values: 仅当您的数据包含整数值时:

items = [int(x) for x in sum(df.values.tolist(),[]) if str(x) != 'nan']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM