简体   繁体   中英

Removing dash string from mixed dtype column in pandas Dataframe

I have a dataframe with possible objects mixed with numerical values.

My target is to change every value to a simple integer, however, some of these values have - between numbers.

A minimal working example looks like:

import pandas as pd

d = {'API':[float(4433), float(3344), 6666, '6-9-11', '8-0-11', 9990]}
df = pd.DataFrame(d)

I try:

df['API'] = df['API'].str.replace('-','')

But this leaves me with nan for the numeric types because it's searching the entire frame for the strings only.

The output is:

API

nan
nan
nan
6911
8011
nan

I'd like an output:

API

4433
3344
6666
6911
8011
9990

Where all types are int .

Is there an easy way to take care of just the object types in the Series but leaving the actual numericals in tact? I'm using this technique on large data sets (300,000+ lines) so something like lambda or series operations would be preferred over a loop search.

Use df.replace with regex=True

df = df.replace('-', '', regex=True).astype(int)

    API
0   4433
1   3344
2   6666
3   6911
4   8011
5   9990

也,

df['API'] = df['API'].astype(str).apply(lambda x: x.replace('-', '')).astype(int)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM