[英]Change dtypes in Pandas DataFrame cell-by-cell
Problem: 问题:
I have a Pandas.DataFrame which stores only unicode values. 我有一个Pandas.DataFrame只存储unicode值。 Each column contains values that could be converted to either an integer or float, or left as unicode.
每列包含可以转换为整数或浮点数或保留为unicode的值。 (Python version 2.7.15, Pandas version 0.23.0)
(Python版本2.7.15,熊猫版本0.23.0)
df = pd.DataFrame({'x':[u'1', u'1.23', u'', u'foo_text'], 'y':[u'bar_text', u'', u'2', u'4.56']})
print df
x y
0 1 bar_text
1 1.23
2 2
3 foo_text 4.56
I would like to convert the type of each cell as follows: 我想按如下方式转换每个单元格的类型:
Solution attempts: 解决方案尝试:
The following code does precisely what I want: 以下代码正是我想要的:
type_list = [int, float, unicode]
for column in df.columns:
for index in df.index:
for desired_type in type_list:
try:
df.loc[index,column] = desired_type(df.loc[index,column])
break
except ValueError:
pass
The problem is that my actual DataFrame is >10 million cells, and this will way too long to execute. 问题是我的实际DataFrame是> 1000万个单元,执行起来的时间太长了。 I am trying to find a faster way to do this.
我正在尝试找到一种更快的方法。
I have looked at pandas.DataFrame.infer_objects()
, and pandas.to_numeric()
, but neither appear to handle the case of mixed types within a column. 我看过
pandas.DataFrame.infer_objects()
和pandas.to_numeric()
,但似乎都无法处理列中混合类型的情况。
Try using a function along with .apply()
which will be a lot faster than three nested for-loops. 尝试与
.apply()
一起使用一个函数,该函数比三个嵌套的for循环要快得多。
So something like: 所以像这样:
def change_dtype(value):
try:
return int(value)
except ValueError:
try:
return float(value)
except ValueError:
return value
for column in df.columns:
df.loc[:, column] = df[column].apply(change_dtype)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.