[英]Removing dash string from mixed dtype column in pandas Dataframe
I have a dataframe with possible objects mixed with numerical values. 我有一个数据框,其中可能的对象与数值混合在一起。
My target is to change every value to a simple integer, however, some of these values have -
between numbers. 我的目标是每一个值更改为一个简单的整数,但是,一些值有
-
数字之间。
A minimal working example looks like: 一个最小的工作示例如下所示:
import pandas as pd
d = {'API':[float(4433), float(3344), 6666, '6-9-11', '8-0-11', 9990]}
df = pd.DataFrame(d)
I try: 我尝试:
df['API'] = df['API'].str.replace('-','')
But this leaves me with nan
for the numeric types because it's searching the entire frame for the strings only. 但这让
nan
保留了数字类型,因为它只在整个框架中搜索字符串。
The output is: 输出为:
API
nan
nan
nan
6911
8011
nan
I'd like an output: 我想要一个输出:
API
4433
3344
6666
6911
8011
9990
Where all types are int
. 所有类型均为
int
。
Is there an easy way to take care of just the object types in the Series but leaving the actual numericals in tact? 是否有一种简单的方法来处理系列中的对象类型,而使实际数值保持不变? I'm using this technique on large data sets (300,000+ lines) so something like
lambda
or series operations
would be preferred over a loop search. 我在大型数据集(超过300,000行)上使用了此技术,因此,像
lambda
或series operations
类的东西比循环搜索更可取。
Use df.replace
with regex=True
将
df.replace
与regex=True
df = df.replace('-', '', regex=True).astype(int)
API
0 4433
1 3344
2 6666
3 6911
4 8011
5 9990
也,
df['API'] = df['API'].astype(str).apply(lambda x: x.replace('-', '')).astype(int)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.