使用 to_csv 和 dask 忽略不匹配的列类型

Question

I am trying to export a dataframe using dask with the dask.dataframe.to_csv(datframe_name, file etc..) command that was listed in the Dask manual: https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.read_csv I am trying to export a dataframe using dask with the dask.dataframe.to_csv(datframe_name, file etc..) command that was listed in the Dask manual: https://docs.dask.org/en/latest/dataframe-api .html#dask.dataframe.read_csv

I'm using dask because the original csv file was very large (20Gb) and it was very slow to use pandas to read the file.我正在使用dask ，因为原始 csv 文件非常大（20Gb），使用 pandas 读取文件非常慢。

However, everything I try to export the dataframe I keep getting the following error:但是，我尝试导出 dataframe 的所有内容都出现以下错误：

ValueError: Mismatched dtypes found in `pd.read_csv`/`pd.read_table`.

+----------+--------+----------+
| Column   | Found  | Expected |
+----------+--------+----------+
| DeviceID | object | int64    |
| Lat      | object | float64  |
| Long     | object | float64  |
+----------+--------+----------+

It is strange that the dataframe is finding the columns as objects, when their dtypes are integer and float.奇怪的是，当dtypes的 dtype 为 integer 和浮点时，它们将列作为对象查找。

Is there a way to ignore reading the columns' types and just export the dataframe as is?有没有办法忽略读取列的类型并按原样导出 dataframe ？

Answer 1

The error message is telling you that when reading your data from the original CSV files Dask dataframe found that your data actually wasn't numeric as it originally guessed.错误消息告诉您，当从原始 CSV 文件读取数据时，Dask dataframe 发现您的数据实际上不是最初猜测的数字。 A common cause for this is that you have a few rows of your data that actually aren't numeric in some way.造成这种情况的一个常见原因是您有几行数据实际上在某种程度上不是数字。 Perhaps you have a custom NA value or some of the rows of your data are mismatched in some way.也许您有一个自定义的 NA 值，或者您的某些数据行在某些方面不匹配。

使用 to_csv 和 dask 忽略不匹配的列类型

问题描述

1 个解决方案

解决方案1
0 2019-10-19 13:41:06

使用 to_csv 和 dask 忽略不匹配的列类型

问题描述

1 个解决方案

解决方案1 0 2019-10-19 13:41:06

解决方案1
0 2019-10-19 13:41:06