简体   繁体   English

逐个单元地更改Pandas DataFrame中的dtypes

[英]Change dtypes in Pandas DataFrame cell-by-cell

Problem: 问题:

I have a Pandas.DataFrame which stores only unicode values. 我有一个Pandas.DataFrame只存储unicode值。 Each column contains values that could be converted to either an integer or float, or left as unicode. 每列包含可以转换为整数或浮点数或保留为unicode的值。 (Python version 2.7.15, Pandas version 0.23.0) (Python版本2.7.15,熊猫版本0.23.0)

df = pd.DataFrame({'x':[u'1', u'1.23', u'', u'foo_text'], 'y':[u'bar_text', u'', u'2', u'4.56']})
print df
          x         y
0         1  bar_text
1      1.23
2                   2
3  foo_text      4.56

I would like to convert the type of each cell as follows: 我想按如下方式转换每个单元格的类型:

  1. If possible to convert to int, convert to int 如果可能转换为int,则转换为int
  2. Else if possible to convert to float, convert to float 否则,请转换为浮点数,再转换为浮点数
  3. Else, leave as unicode 否则,保留为unicode

Solution attempts: 解决方案尝试:

The following code does precisely what I want: 以下代码正是我想要的:

type_list = [int, float, unicode]
for column in df.columns:
    for index in df.index:
        for desired_type in type_list:
            try:
                df.loc[index,column] = desired_type(df.loc[index,column])
                break
            except ValueError:
                pass

The problem is that my actual DataFrame is >10 million cells, and this will way too long to execute. 问题是我的实际DataFrame是> 1000万个单元,执行起来的时间太长了。 I am trying to find a faster way to do this. 我正在尝试找到一种更快的方法。

I have looked at pandas.DataFrame.infer_objects() , and pandas.to_numeric() , but neither appear to handle the case of mixed types within a column. 我看过pandas.DataFrame.infer_objects()pandas.to_numeric() ,但似乎都无法处理列中混合类型的情况。

Try using a function along with .apply() which will be a lot faster than three nested for-loops. 尝试与.apply()一起使用一个函数,该函数比三个嵌套的for循环要快得多。

So something like: 所以像这样:

def change_dtype(value):
    try:
        return int(value)
    except ValueError:
        try:
            return float(value)
        except ValueError:
            return value

for column in df.columns:
    df.loc[:, column] = df[column].apply(change_dtype)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM