简体   繁体   中英

Converting dataframe column of mixed types to int, ignore values with non numeric characters

df:

    IDs
0  text
1   001
2     1

df = pd.DataFrame({'IDs': ['text', '001', '1']})

And I'd like to convert the values to int where possible so strings corresponding to the same entity, 001 and 1 , become identical values, through dropping the '00' prefix.

This is demonstrated in pandas documentation , but neither df['IDs'] = pd.to_numeric(df['IDs'], errors='ignore') or df['IDs'] = df['IDs'].astype(int, errors='ignore') is changing anything.

What am I doing wrong?

It is expected, docs to_numeric say:

If 'ignore', then invalid parsing will return the input.

so it means if invalid at least one value it return same values.

Possible solution is use custom function with try-except :

df = pd.DataFrame({'IDs': ['text', '001', '1']})
def func(x):
    try:
        return int(x)
    except:
        return x

df['IDs'] = df['IDs'].apply(func)
print (df)
    IDs
0  text
1     1
2     1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM