Converting dataframe column of mixed types to int, ignore values with non numeric characters

Question

df:

    IDs
0  text
1   001
2     1

df = pd.DataFrame({'IDs': ['text', '001', '1']})

And I'd like to convert the values to int where possible so strings corresponding to the same entity, 001 and 1 , become identical values, through dropping the '00' prefix.

This is demonstrated in pandas documentation , but neither df['IDs'] = pd.to_numeric(df['IDs'], errors='ignore') or df['IDs'] = df['IDs'].astype(int, errors='ignore') is changing anything.

What am I doing wrong?

Answer 1

It is expected, docs to_numeric say:

If 'ignore', then invalid parsing will return the input.

so it means if invalid at least one value it return same values.

Possible solution is use custom function with try-except :

df = pd.DataFrame({'IDs': ['text', '001', '1']})
def func(x):
    try:
        return int(x)
    except:
        return x

df['IDs'] = df['IDs'].apply(func)
print (df)
    IDs
0  text
1     1
2     1

Converting dataframe column of mixed types to int, ignore values with non numeric characters

Question

1 answers

solution1
1 ACCPTED 2020-10-20 07:51:53

Converting dataframe column of mixed types to int, ignore values with non numeric characters

Question

1 answers

solution1 1 ACCPTED 2020-10-20 07:51:53

solution1
1 ACCPTED 2020-10-20 07:51:53