简体   繁体   中英

How to convert a column with object dtype to float in a pandas dataframe?

I have a dataframe with a column named 'height' and I want to convert the values into float. The default unit is in meter, but it has some values in incorrect format, or in inches. It looks like

        height
0          16
1           7
2           7
3         6 m
4        2.40
5        5'8"
6          3m
7         6,9
8       9;6;3
9     Unknown
10       4.66
11 Bilinmiyor
12     11' 4"

dtype: object

Basically, I need to convert values in inches/ft to meter unit, convert values like Bilinmiyor and Unknown to NaN , remove the unit specification like m m , replace comma in the decimal numbers with . , and keep the largest number for value 9;6;3 . The final dtypes should be float or int.

I am new to python so I don't really know how to use advanced techniques so far. I was trying to achieve the task using

def to_num(a):
    try:
        return float(pd.to_numeric(a, errors = 'raise'))
    except ValueError:
        return a

df['height'] = to_num(df['height'])

but it didn't work. I was wondering if I should use iteration but it seems very complicated to iterate through all cells in this column, because the dataset has more than 2 million rows.

I feel you mate, I had the same kind of problems. But thankfully this is not that hard

import pandas as pd

df = pd.DataFrame({'height': [16, 7, '6m', '2.4', '3,5', 'Asdf', '9;6;3']})
df['height'] = df['height'].astype(str)  # force type str
df['height'] = df['height'].str.replace('.', ',', regex=False)  # . -> ,
df['height'] = df['height'].str.replace('[A-Za-z]', '')  # remove all characters (regex)
df['height'] = df['height'].str.split(';').apply(max)  # pick largest value from 9;6;3
df['height'] = pd.to_numeric(df['height'], errors='coerce')  # force float

And you get

height
0   16.0
1   7.0
2   6.0
3   2.4
4   3.5
5   NaN
6   9.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM