简体   繁体   中英

Find string data-type that includes a number in Pandas DataFrame and change the type

I have a dataframe with multiple columns. One or more than one column contain string values that may or may not include numbers (integer or float).

import pandas as pd
import numpy as np

data = [('A', '>10', 'ABC'),
        ('B', '10', '15'),
        ('C', '<10', '>10'),
        ('D', '10', '15'),
        ('E', '10-20', '10-30'),
        ('F', '20.0', 'ABC'),
        ('G', '25.1', '30.1') ]

data_df = pd.DataFrame(data, columns = ['name', 'value1', 'value2'])

I am looking for a method to check each of the cells inside the dataframe if there is any value which is assigned as strings but contains numerical(integer or float) value and then change it to integer or float by keeping the whole dataframe intact(not changing it to array)

so far, I found " How to find string data-type that includes a number in Pandas DataFrame " article on stackoverflow useful, but this article is guided to drop the numerical values stored as string types.

If need all values numeric repalce non numeric to missing values:

data_df.iloc[:, 1:] = data_df.iloc[:, 1:].apply(pd.to_numeric, errors='coerce')
print (data_df)
  name value1 value2
0    A    NaN    NaN
1    B   10.0   15.0
2    C    NaN    NaN
3    D   10.0   15.0
4    E    NaN    NaN
5    F   20.0    NaN
6    G   25.1   30.1

If need replace missing values to original strings:

data_df.iloc[:, 1:] = (data_df.iloc[:, 1:]
                              .apply(pd.to_numeric, errors='coerce')
                              .fillna(data_df.iloc[:, 1:]))
print (data_df)
  name value1 value2
0    A    >10    ABC
1    B   10.0   15.0
2    C    <10    >10
3    D   10.0   15.0
4    E  10-20  10-30
5    F   20.0    ABC
6    G   25.1   30.1

But then get mixed types numeric with strings:

print (data_df.iloc[:, 1:].applymap(type))
            value1           value2
0    <class 'str'>    <class 'str'>
1  <class 'float'>  <class 'float'>
2    <class 'str'>    <class 'str'>
3  <class 'float'>  <class 'float'>
4    <class 'str'>    <class 'str'>
5  <class 'float'>    <class 'str'>
6  <class 'float'>  <class 'float'>

EDIT:

cols = data_df.select_dtypes(object).columns.difference(['name'], sort=False)
data_df[cols] = data_df[cols].apply(lambda x: pd.to_numeric(x.str.strip(), errors='coerce'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM