简体   繁体   English

在 Dask 中创建 Dataframe

[英]Create a Dataframe in Dask

I'm just starting using Dask as a possible replacement (?) of pandas.我刚刚开始使用 Dask 作为 pandas 的可能替代品(?)。 The first think that hit me is that i can't seem to find a way to create a dataframe from a couple lists/arrays.打我的第一个想法是,我似乎找不到从几个列表/数组中创建 dataframe 的方法。

In regular pandas i just do: pd.DataFrame({'a':a,'b':b,...}) but i can't find an equivalent way to do it in Dask, other than create the df in pandas and then create a dask df with from_pandas() .在常规 pandas 我只是这样做: pd.DataFrame({'a':a,'b':b,...})但我找不到在 Dask 中执行此操作的等效方法,除了在中创建 df pandas 然后使用from_pandas()创建一个 dask df。

Is there any way?有什么办法吗? Or the only way is literally to create the df in pandas and then "import" it into a dask df?或者唯一的方法是在 pandas 中创建 df,然后将其“导入”到 dask df 中?

There is a fairly recent feature by @MrPowers that allows creating dask.DataFrame using from_dict method : @MrPowers 有一个相当新的功能,允许使用from_dict方法创建dask.DataFrame

from dask.dataframe import DataFrame
ddf = DataFrame.from_dict({"num1": [1, 2, 3], "num2": [7, 8, 9]}, npartitions=2)

However, note that this method is meant for more concise dask.DataFrame code when used in tutorials and code examples, so when working with real datasets it's better to use more appropriate methods, eg read_csv or read_parquet .但是,请注意,此方法用于在教程和代码示例中使用更简洁的dask.DataFrame代码,因此在处理真实数据集时最好使用更合适的方法,例如read_csvread_parquet

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM