简体   繁体   English

Pandas 说每一列都是一个对象,即使我认为它是一个整数

[英]Pandas says every column is an object, even though I think it's an integer

I have a data frame that is somehow all objects - which I think should be okay.我有一个以某种方式包含所有对象的数据框 - 我认为应该没问题。 Notice that the first column has values like "10180.".请注意,第一列的值类似于“10180.”。

Problem solved: There was some kind of weird unicode thing going on.问题已解决:发生了某种奇怪的 unicode 事情。 My task lead solved the problem.我的任务负责人解决了这个问题。 We just read it in as straight excel instead of converting to csv (I was using libreoffice to do that).我们只是将它作为直接的 excel 读入而不是转换为 csv(我使用 libreoffice 来做到这一点)。 Problem solved.问题解决了。 A big hint was all these things that "should" work that were not working.一个重要的提示是所有这些“应该”起作用但不起作用的东西。

Those should all be "10180" - no decimal.这些都应该是“10180”——没有小数。 (Note that in Jupyter it displays correctly. Only should up as a decimal when I output as csv. However Jupyter does know that it's an object.) (请注意,在 Jupyter 中它显示正确。只有当我输出为 csv 时才应该显示为十进制。但是 Jupyter 确实知道它是一个对象。)

Another problem is potentially the data values that look like "2,361.9".另一个问题可能是看起来像“2,361.9”的数据值。 Those should be floats.那些应该是浮点数。 I thought I could do a similar thing with those to get rid of the commas and then convert.我想我可以对那些做类似的事情来摆脱逗号然后转换。

Sample data:样本数据:

CBSA Code,CBSA Title,violent,murder,rape,robbery,assault,property,burglary,larceny,vehicle theft
10180.0,"Abilene, TX",393.2,5.3,64.0,65.7,258.2,"2,361.9",534.0,"1,670.0",157.8
10420.0,"Akron, OH",361.6,6.4,48.7,73.0,233.6,"2,226.0",415.6,"1,659.4",150.9
10500.0,"Albany, GA",728.5,11.6,30.6,95.1,591.3,"3,734.5",773.4,"2,715.1",246.0
10580.0,"Albany-Schenectady-Troy, NY",283.7,2.2,38.3,62.4,180.8,"1,892.3",226.9,"1,584.8",80.6

That first column should be integer.第一列应该是整数。 I've tried我试过了

df[‘CBSA Code’].apply(np.int64)  AND

df[‘CBSA Code’].astype(int) AND

df[‘CBSA Code’].astype(str).astype(int) AND

df[‘CBSA Code’] = df[‘CBSA Code’].astype(str)
df[‘CBSA Code’] = df[‘CBSA Code’].replace(“.0”, ’’)
df[‘CBSA Code’] = df[‘CBSA Code’].astype(‘int’)

I've seen some of these posted as answers in other stackoverflow questions.我已经看到其中一些作为其他 stackoverflow 问题的答案发布。 But it's not working for me.但这对我不起作用。 This must be a common dilemma.这应该是一个普遍的困境。 Is there a canonical way of doing this?有没有规范的方法来做到这一点?

The error msg with the df['CBSA Code'].apply(np.int64) follows带有 df['CBSA Code'].apply(np.int64) 的错误信息如下

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-189-6c1c6381a02c> in <module>
----> 1 df['CBSA Code'].apply(np.int64)

~\AppData\Roaming\Python\Python37\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
   3589             else:
   3590                 values = self.astype(object).values
-> 3591                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3592 
   3593         if len(mapped) and isinstance(mapped[0], Series):

pandas\_libs\lib.pyx in pandas._libs.lib.map_infer()

ValueError: invalid literal for int() with base 10: '10180.0'

如果问题是CBSA Code列是一个格式化为字符串的浮点数(从错误消息看来: ValueError: invalid literal for int() with base 10: '10180.0' ),那么您不能直接转换为int,但您可以先将其转换为 float,然后再转换为 int:

df["CBSA Code"] = df["CBSA Code"].astype(float).astype(int)

I suspect CBSA Code has some non-numeric values, so read_csv defaults it to dtype object .我怀疑CBSA Code有一些非数字值,因此 read_csv 将其默认为 dtype object You may try using nullable integer dtype Int64 ( note : it is uppercase 'I' )您可以尝试使用可为空的整数 dtype Int64注意:它是大写的'I'

df['CBSA Code'] = pd.to_numeric(df['CBSA Code'], errors='coerce').astype('Int64')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当我认为它是列表时,为什么将此对象视为整数? - Why is this object regarded as integer when I think it's a list? 即使我使用的是确切的列名,Pandas 对象也没有属性 - Pandas object has no attribute even though I'm using exact column name pandas 始终将列转换为 object 即使列仅包含 integer - pandas always cast column to object even if column contains only integer 为什么即使 pandas 中存在列,我也会收到 Key 错误? - Why I get Key error even though column present in pandas? TypeError: &#39;NoneType&#39; object is not iterable 即使它说它是一个字符串,为什么会这样? - TypeError: 'NoneType' object is not iterable even though it says it's a string, why is that so? 即使Anaconda说它已安装,导入Numpy也会导致错误? - Importing Numpy results in error even though Anaconda says it's installed? Visual Studio 表示即使安装了 psutil 模块也未找到 - Visual studio says psutil module not found even though it's installed 即使已安装 Pandas 模块也未找到 - Pandas Module Not Found even though it's installed 将列标题添加到 pandas 数据框..但是 NAN 是所有数据,即使标题是相同的维度 - Adding Column headers to pandas dataframe.. but NAN's all the data even though headers are same dimension Pycharm 说没有 boto3 模块,即使我安装了 - Pycharm says no boto3 module even though I installed
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM