简体   繁体   English

Python:Pandas 读取 csv:在读取 csv 时向下转型

[英]Python: Pandas read csv: Downcasting while reading csv

I have the following problem.我有以下问题。 I want to read a large csv with million rows and hudnereds of columns.我想读取一个包含数百万行和列数的大型 csv。 I want to downcast the dtypes for the columns.我想降低列的 dtypes。 My approach is to read the csv and then downcasting it with pd.to_numeric() .我的方法是读取 csv,然后使用pd.to_numeric()将其向下转换。 I do not know the number of columns and their types.我不知道列数及其类型。 Is there any possibility to downcast while reading the csv so i do not have to touch the dataframe twice?在读取 csv 时是否有可能向下转换,因此我不必两次触摸数据框?

My current approch is:我目前的做法是:

import pandas as pd
df = pd.read_csv(filePath, delimiter=delimiter, memory_map=True,engine='c', low_memory=True)
for column in df:
    if is_numeric_dtype(df[column]):
       df[column] = pd.to_numeric(df[column], downcast='signed')
       df[column] = pd.to_numeric(df[column], downcast='float')

Thanks in advance!提前致谢!

If someone has the same problem, you can easily read the fist two lines, calulculate the dtypes and mapping your preferred dtypes over it and using it as dtype argument when reading the whole file:如果有人遇到同样的问题,您可以轻松阅读前两行,计算 dtypes 并将您喜欢的 dtypes 映射到它上面,并在读取整个文件时将其用作 dtype 参数:

Example:例子:

import pandas as pd
df = pd.read_csv(filePath, delimiter=delimiter, nrows=2, low_memory=True, memory_map=True,engine='c')
mapdtypes = {'int64': 'int8', 'float64': 'float32'}
dtypes = list(df.dtypes.apply(str).replace(mapdtypes))
dtype = {key: value for (key, value) in enumerate(dtypes)}
df = pd.read_csv(filePath, delimiter=delimiter, memory_map=True,engine='c', low_memory=True, dtype=dtype)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM