简体   繁体   English

使用其位置/索引转换列的类型

[英]Convert a Column's type using its position/index

I am reading some .csv files from a folder. 我正在从文件夹中读取一些.csv文件。 I am trying to create a list of data frames from using each file. 我正在尝试通过使用每个文件来创建数据帧列表。

In some files the column values, ie Quantity is in str and float64 data types. 在某些文件中,列值(即, Quantitystrfloat64数据类型。 Therefore, I am trying to convert the that column quantity into int . 因此,我正在尝试将该列quantity转换为int

I am accessing my columns using its position/index (For automation purposes). 我正在使用其位置/索引访问列(出于自动化目的)。

Out of all data frames from a list, this is one of them, 在列表的所有数据帧中,这是其中之一,

    CustName    ProductID   Quantity
0   56MED       110         '1215.0'
1   56MED       112         5003.0
2   56MED       114         '6822.0'
3   WillSup     2285        5645.0
4   WillSup     5622        6523.0
5   HammSup     9522        1254.0
6   HammSup     6954        5642.0

Therefore, I have my looks like this, 因此,我看起来像这样

df.columns[2] = pd.to_numeric(df.columns[2], errors='coerce').astype(str).astype(np.int64)

I am getting, 我正进入(状态,

TypeError: Index does not support mutable operations TypeError:索引不支持可变操作

Prior to this, I tried, 在此之前,我尝试过

df.columns[2] = pd.to_numeric(df.columns[2], errors='coerce').fillna(0).astype(str).astype(np.int64)

However, I got this error, 但是,我遇到了这个错误,

AttributeError: 'numpy.float64' object has no attribute 'fillna' AttributeError:“ numpy.float64”对象没有属性“ fillna”

There are posts that have using column names directly, but not columns position. 有些帖子直接使用列名,而不使用列位置。 How can I convert my column into int using the column position/index in pnadas ? 如何使用pnadas的列位置/索引将列转换为int

My pandas version 我的pandas

print(pd.__version__)
>> 0.23.3

df.columns[2] returns a scalar , in this case a string. df.columns[2]返回标量 ,在这种情况下为字符串。

To access a series use either df['Quantity'] or df.iloc[:, 2] , or even df[df.columns[2]] . 要访问系列,请使用df['Quantity']df.iloc[:, 2]甚至是df[df.columns[2]] Instead of the repeated transformations, if you are sure you have data which should be integers, use downcast='integer' . 如果您确定自己的数据应该是整数,则可以使用downcast='integer'来代替重复的转换。

All these are equivalent: 所有这些都是等效的:

df['Quantity'] = pd.to_numeric(df['Quantity'], errors='coerce', downcast='integer')

df.iloc[:, 2] = pd.to_numeric(df.iloc[:, 2], errors='coerce', downcast='integer')

df[df.columns[2]] = pd.to_numeric(df[df.columns[2]], errors='coerce', downcast='integer')

Try this, you need to remove those quotes from your strings first, then use pd.to_numeric : 尝试此操作,您需要先从字符串中删除那些引号,然后使用pd.to_numeric

df.iloc[:, 2] = pd.to_numeric(df.iloc[:, 2].str.strip('\'')).astype(int)

OR from @jpp: 或来自@jpp:

df['Quantity'] = pd.to_numeric(df['Quantity'].str.strip('\''), errors='coerce', downcast='integer')

Output, df.info(): 输出df.info():

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7 entries, 0 to 6
Data columns (total 3 columns):
CustName     7 non-null object
ProductID    7 non-null int64
Quantity     7 non-null int32
dtypes: int32(1), int64(1), object(1)
memory usage: 196.0+ bytes

Output: 输出:

  CustName  ProductID  Quantity
0    56MED        110      1215
1    56MED        112      5003
2    56MED        114      6822
3  WillSup       2285      5645
4  WillSup       5622      6523
5  HammSup       9522      1254
6  HammSup       6954      5642

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM