使用其位置/索引转换列的类型

Question

I am reading some .csv files from a folder. 我正在从文件夹中读取一些.csv文件。 I am trying to create a list of data frames from using each file. 我正在尝试通过使用每个文件来创建数据帧列表。

In some files the column values, ie Quantity is in str and float64 data types. 在某些文件中，列值（即， Quantity为str和float64数据类型。 Therefore, I am trying to convert the that column quantity into int . 因此，我正在尝试将该列quantity转换为int 。

I am accessing my columns using its position/index (For automation purposes). 我正在使用其位置/索引访问列（出于自动化目的）。

Out of all data frames from a list, this is one of them, 在列表的所有数据帧中，这是其中之一，

    CustName    ProductID   Quantity
0   56MED       110         '1215.0'
1   56MED       112         5003.0
2   56MED       114         '6822.0'
3   WillSup     2285        5645.0
4   WillSup     5622        6523.0
5   HammSup     9522        1254.0
6   HammSup     6954        5642.0

Therefore, I have my looks like this, 因此，我看起来像这样

df.columns[2] = pd.to_numeric(df.columns[2], errors='coerce').astype(str).astype(np.int64)

I am getting, 我正进入（状态，

TypeError: Index does not support mutable operations TypeError：索引不支持可变操作

Prior to this, I tried, 在此之前，我尝试过

df.columns[2] = pd.to_numeric(df.columns[2], errors='coerce').fillna(0).astype(str).astype(np.int64)

However, I got this error, 但是，我遇到了这个错误，

AttributeError: 'numpy.float64' object has no attribute 'fillna' AttributeError：“ numpy.float64”对象没有属性“ fillna”

There are posts that have using column names directly, but not columns position. 有些帖子直接使用列名，而不使用列位置。 How can I convert my column into int using the column position/index in pnadas ? 如何使用pnadas的列位置/索引将列转换为int ？

My pandas version 我的pandas版

print(pd.__version__)
>> 0.23.3

Answer 1

df.columns[2] returns a scalar , in this case a string. df.columns[2]返回标量，在这种情况下为字符串。

To access a series use either df['Quantity'] or df.iloc[:, 2] , or even df[df.columns[2]] . 要访问系列，请使用df['Quantity']或df.iloc[:, 2]甚至是df[df.columns[2]] 。 Instead of the repeated transformations, if you are sure you have data which should be integers, use downcast='integer' . 如果您确定自己的数据应该是整数，则可以使用downcast='integer'来代替重复的转换。

All these are equivalent: 所有这些都是等效的：

df['Quantity'] = pd.to_numeric(df['Quantity'], errors='coerce', downcast='integer')

df.iloc[:, 2] = pd.to_numeric(df.iloc[:, 2], errors='coerce', downcast='integer')

df[df.columns[2]] = pd.to_numeric(df[df.columns[2]], errors='coerce', downcast='integer')

Answer 2

Try this, you need to remove those quotes from your strings first, then use pd.to_numeric : 尝试此操作，您需要先从字符串中删除那些引号，然后使用pd.to_numeric ：

df.iloc[:, 2] = pd.to_numeric(df.iloc[:, 2].str.strip('\'')).astype(int)

OR from @jpp: 或来自@jpp：

df['Quantity'] = pd.to_numeric(df['Quantity'].str.strip('\''), errors='coerce', downcast='integer')

Output, df.info(): 输出df.info（）：

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7 entries, 0 to 6
Data columns (total 3 columns):
CustName     7 non-null object
ProductID    7 non-null int64
Quantity     7 non-null int32
dtypes: int32(1), int64(1), object(1)
memory usage: 196.0+ bytes

Output: 输出：

  CustName  ProductID  Quantity
0    56MED        110      1215
1    56MED        112      5003
2    56MED        114      6822
3  WillSup       2285      5645
4  WillSup       5622      6523
5  HammSup       9522      1254
6  HammSup       6954      5642

使用其位置/索引转换列的类型

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-02-04 21:22:58

解决方案2
1 2019-02-04 21:34:27

使用其位置/索引转换列的类型

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-02-04 21:22:58

解决方案2 1 2019-02-04 21:34:27

解决方案1
2 已采纳 2019-02-04 21:22:58

解决方案2
1 2019-02-04 21:34:27