简体   繁体   English

将列表或 numpy 数组作为列添加到 dask dataframe

[英]Add list or numpy array as column to a dask dataframe

How can i add a list or a numpy array as a column to a Dask dataframe?如何将列表或 numpy 数组作为列添加到 Dask dataframe? When i try with the regular pandas syntax df['x']=x it gives me a TypeError: Column assignment doesn't support type list error.当我尝试使用常规的 pandas 语法df['x']=x时,它给了我一个TypeError: Column assignment doesn't support type list错误。

You can add a pandas series:您可以添加 pandas 系列:

df["new_col"] = pd.Series(my_list, index=index_matching_df_index)

The issue is that the index is extremely important so dask can understand how to partition the data.问题是索引非常重要,因此 dask 可以了解如何对数据进行分区。 The size of each partition in a dask dataframe is not always known, so you cannot assign by position. dask dataframe 中每个分区的大小并不总是已知的,因此您无法通过 position 进行分配。

I finally solved it just casting the list into a dask array with dask.array.from_array() , which i think it's the most direct way.我终于解决了它,只需使用dask.array.from_array()将列表转换为 dask 数组,我认为这是最直接的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM