简体   繁体   English

无法使用 read_csv 将列转换为类别 dtypes Pandas

[英]Can't convert column to category dtypes Pandas with read_csv

I have data from csv and load it with read_csv in Pandas.我有来自 csv 的数据,并在 Pandas 中使用 read_csv 加载它。 I try to convert 6 column to float32 and its worked, but category column not converted..我尝试将 6 列转换为 float32 并且它的工作,但类别列未转换..

I have checked my 'div' column and there is no problem with it:我检查了我的“div”列,没有问题:

df_concat['div'].unique()

array(['L', 'J', 'K', 'U', 'E', 'B', 'A', 'C', 'N', 'X', 'M', 'O', 'D',
       'I', 'P', 'Q', 'S', 'R', 'T'], dtype=object)

I tried to limit data with nrows=4000000 and it success converted to category dtypes?我尝试使用 nrows=4000000 限制数据,并且成功转换为类别 dtypes? what's wrong with it?它出什么问题了?

this my code:这是我的代码:

names = ['bdate', 'nama_site', 'kode_store', 'div', 'merdivdesc', 'cat', 'catdesc', 'subcat', 'subcatdesc', 'brand', 'sku', 'sku_desc', 'tillcode', 'netsales', 'profit', 'margin', 'qty']

dtype = {
    'netsales' : 'float32', 'profit' : 'float32', 'margin' : 'float32', 'qty' : 'float32',
    'div' : 'category'
}

data = pd.read_csv('clean_jan20_minified.csv', sep='|', dtype=dtype, chunksize=20000, names=names, skiprows=[0], nrows=4000000)

chunk_list = []  
for chunk in data:  
    chunk_list.append(chunk)

df_concat = pd.concat(chunk_list, ignore_index=True)

when i try manually convert with df_concat['div']=df_concat['div'].astype('category') it works.当我尝试使用df_concat['div']=df_concat['div'].astype('category')手动转换时,它可以工作。 but i need convert it when read_csv但我需要在 read_csv 时转换它

When using pd.concat , it looks like you lost your category data type.使用pd.concat时,您似乎丢失了类别数据类型。

See this article just above General guidelines at the end of the article: https://pbpython.com/pandas_dtypes_cat.html请参阅本文上方的文章末尾的一般准则: https://pbpython.com/pandas_dtypes_cat.html

" In this case, the data is still there but the type has been converted to an object. Once again, this is pandas attempt to combine the data without throwing errors but not making assumptions. If you want to convert to a category data type now, you can use astype('category'). " "在这种情况下,数据仍然存在,但类型已转换为 object。再一次,这是 pandas 尝试组合数据而不抛出错误但不进行假设。如果您想立即转换为类别数据类型, 你可以使用 astype('category')。 "

Also, you might want to try .reorder_categories per this post: pandas - concat with columns of same categories turns to object此外,您可能想根据这篇文章尝试.reorder_categoriespandas - concat 与相同类别的列转为 object

Without Sample data, I cannot help you troubleshoot.如果没有样本数据,我无法帮助您进行故障排除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM