简体   繁体   English

从pandas Dataframe中的混合dtype列中删除破折号字符串

[英]Removing dash string from mixed dtype column in pandas Dataframe

I have a dataframe with possible objects mixed with numerical values. 我有一个数据框,其中可能的对象与数值混合在一起。

My target is to change every value to a simple integer, however, some of these values have - between numbers. 我的目标是每一个值更改为一个简单的整数,但是,一些值有-数字之间。

A minimal working example looks like: 一个最小的工作示例如下所示:

import pandas as pd

d = {'API':[float(4433), float(3344), 6666, '6-9-11', '8-0-11', 9990]}
df = pd.DataFrame(d)

I try: 我尝试:

df['API'] = df['API'].str.replace('-','')

But this leaves me with nan for the numeric types because it's searching the entire frame for the strings only. 但这让nan保留了数字类型,因为它只在整个框架中搜索字符串。

The output is: 输出为:

API

nan
nan
nan
6911
8011
nan

I'd like an output: 我想要一个输出:

API

4433
3344
6666
6911
8011
9990

Where all types are int . 所有类型均为int

Is there an easy way to take care of just the object types in the Series but leaving the actual numericals in tact? 是否有一种简单的方法来处理系列中的对象类型,而使实际数值保持不变? I'm using this technique on large data sets (300,000+ lines) so something like lambda or series operations would be preferred over a loop search. 我在大型数据集(超过300,000行)上使用了此技术,因此,像lambdaseries operations类的东西比循环搜索更可取。

Use df.replace with regex=True df.replaceregex=True

df = df.replace('-', '', regex=True).astype(int)

    API
0   4433
1   3344
2   6666
3   6911
4   8011
5   9990

也,

df['API'] = df['API'].astype(str).apply(lambda x: x.replace('-', '')).astype(int)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM