Python：Pandas 读取 csv：在读取 csv 时向下转型

Question

I have the following problem.我有以下问题。 I want to read a large csv with million rows and hudnereds of columns.我想读取一个包含数百万行和列数的大型 csv。 I want to downcast the dtypes for the columns.我想降低列的 dtypes。 My approach is to read the csv and then downcasting it with pd.to_numeric() .我的方法是读取 csv，然后使用pd.to_numeric()将其向下转换。 I do not know the number of columns and their types.我不知道列数及其类型。 Is there any possibility to downcast while reading the csv so i do not have to touch the dataframe twice?在读取 csv 时是否有可能向下转换，因此我不必两次触摸数据框？

My current approch is:我目前的做法是：

import pandas as pd
df = pd.read_csv(filePath, delimiter=delimiter, memory_map=True,engine='c', low_memory=True)
for column in df:
    if is_numeric_dtype(df[column]):
       df[column] = pd.to_numeric(df[column], downcast='signed')
       df[column] = pd.to_numeric(df[column], downcast='float')

Thanks in advance!提前致谢！

Answer 1

If someone has the same problem, you can easily read the fist two lines, calulculate the dtypes and mapping your preferred dtypes over it and using it as dtype argument when reading the whole file:如果有人遇到同样的问题，您可以轻松阅读前两行，计算 dtypes 并将您喜欢的 dtypes 映射到它上面，并在读取整个文件时将其用作 dtype 参数：

Example:例子：

import pandas as pd
df = pd.read_csv(filePath, delimiter=delimiter, nrows=2, low_memory=True, memory_map=True,engine='c')
mapdtypes = {'int64': 'int8', 'float64': 'float32'}
dtypes = list(df.dtypes.apply(str).replace(mapdtypes))
dtype = {key: value for (key, value) in enumerate(dtypes)}
df = pd.read_csv(filePath, delimiter=delimiter, memory_map=True,engine='c', low_memory=True, dtype=dtype)

Python：Pandas 读取 csv：在读取 csv 时向下转型

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-02-22 10:42:42

Python：Pandas 读取 csv：在读取 csv 时向下转型

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-02-22 10:42:42

解决方案1
2 已采纳 2019-02-22 10:42:42