How to find rows with column values having a particular datatype in a Pandas DATAFRAME

Question

I have a dataframe

name    col1
satya    12
satya    abc
satya    109.12
alex     apple
alex     1000

So now i need to display the rows where column 'col1' has int value in it.O/p looks like

name    col1
satya    12
alex     1000

if search for string value

name    col1
satya    abc
alex     apple

Like wise..please suggest some code lines(may be using reg).

Answer 1

Let's start with a simple regex that will evaluate to True if you have an integer and False otherwise:

import re
regexp = re.compile('^-?[0-9]+$')
bool(regexp.match('1000'))
True
bool(regexp.match('abc'))
False

Once you have such a regex you can proceed as follows:

mask = df['col1'].map(lambda x: bool(regexp.match(x)) )
df.loc[mask]

    name    col1
0   satya   12
4   alex    1000

To search for strings you'll do:

regexp_str = re.compile('^[a-zA-Z]+$')
mask_str = df['col1'].map(lambda x: bool(regexp_str.match(x)))
df.loc[mask_str]

    name    col1
1   satya   abc
3   alex    apple

EDIT

The above code would work if dataframe were created by:

df = pd.read_clipboard()

(or, alternatively, all variables were supplied as strings).

If the regex approach works depends on how the df was created. Eg, if it were created with:

df = pd.DataFrame({'name': ['satya','satya','satya', 'alex', 'alex'],
                   'col1': [12,'abc',109.12,'apple',1000] },
                   columns=['name','col1'])

the above code would fail with TypeError: expected string or bytes-like object

To make it work in any case, one would need to explicitly coerce type to str :

mask = df['col1'].astype('str').map(lambda x: bool(regexp.match(x)) )
df.loc[mask]

    name    col1
0   satya   12
4   alex    1000

and the same for strings:

regexp_str = re.compile('^[a-zA-Z]+$')
mask_str = df['col1'].astype('str').map(lambda x: bool(regexp_str.match(x)))
df.loc[mask_str]

    name    col1
1   satya   abc
3   alex    apple

EDIT2

To find a float:

regexp_float = re.compile('^[-\+]?[0-9]*(\.[0-9]+)$')
mask_float = df['col1'].astype('str').map(lambda x: bool(regexp_float.match(x)))
df.loc[mask_float]

    name    col1
2   satya   109.12

Answer 2

In pandas you would do something like this:

mask = df.col1.apply(lambda x: type(x) == int)
print df[mask]

Which would yield your expected output.

Answer 3

You can check whether the value contains only digits:

In [104]: df
Out[104]:
    name    col1
0  satya      12
1  satya     abc
2  satya  109.12
3   alex   apple
4   alex    1000

Integers:

In [105]: df[~df.col1.str.contains(r'\D')]
Out[105]:
    name  col1
0  satya    12
4   alex  1000

Non-integers:

In [106]: df[df.col1.str.contains(r'\D')]
Out[106]:
    name    col1
1  satya     abc
2  satya  109.12
3   alex   apple

if you want to filter all numeric values (integers/float/decimal) you can use pd.to_numeric(..., errors='coerce') :

In [75]: df
Out[75]:
    name    col1
0  satya      12
1  satya     abc
2  satya  109.12
3   alex   apple
4   alex    1000

In [76]: df[pd.to_numeric(df.col1, errors='coerce').notnull()]
Out[76]:
    name    col1
0  satya      12
2  satya  109.12
4   alex    1000

In [77]: df[pd.to_numeric(df.col1, errors='coerce').isnull()]
Out[77]:
    name   col1
1  satya    abc
3   alex  apple

Answer 4

def is_integer(element):
    try:
        int(element) #if this is str then there will be error
        return 1
    except:
        return 0

You can simply define a function as below then list your items with for loop.

def list_str(list_of_data):
    str_list=[]
    for item in list_of_data: #list_of_data = [[names],[col1s]] if just col1s replace item[2] with item[1]
        if not is_integer(item[2]):
            str_list.append(item)
    return str_list

def list_int(list_of_data):
    int_list=[]
    for item in list_of_data:
        if is_integer(item[2]):
            int_list.append(item)
    return int_list

Hope this can help you

Answer 5

You can use df.applymap(np.isreal)

df = pd.DataFrame({'col1': [12,'abc',109.12,'apple',1000], 'name': ['satya','satya','satya', 'alex', 'alex']})
df
col1    name
0   12  satya
1   abc     satya
2   109.12  satya
3   apple   alex
4   1000    alex

df2 = df[df.applymap(np.isreal)]
df2
col1    name
0   12  NaN
1   NaN     NaN
2   109.12  NaN
3   NaN     NaN
4   1000    NaN

df2 = df2[df2.col1.notnull()]
df2
col1    name
0   12  NaN
2   109.12  NaN
4   1000    NaN

index_list = df2.index.tolist()
index_list
[0, 2, 4]

df = df.iloc[index_list]
df
col1    name
0   12  satya
2   109.12  satya
4   1000    alex

How to find rows with column values having a particular datatype in a Pandas DATAFRAME

Question

5 answers

solution1
2 ACCPTED 2016-04-03 07:18:23

solution2
1 2016-04-03 07:32:36

solution3
1 2016-04-03 16:14:38

solution4
0 2016-04-03 06:57:11

solution5
0 2016-04-03 07:47:21

How to find rows with column values having a particular datatype in a Pandas DATAFRAME

Question

5 answers

solution1 2 ACCPTED 2016-04-03 07:18:23

solution2 1 2016-04-03 07:32:36

solution3 1 2016-04-03 16:14:38

solution4 0 2016-04-03 06:57:11

solution5 0 2016-04-03 07:47:21

solution1
2 ACCPTED 2016-04-03 07:18:23

solution2
1 2016-04-03 07:32:36

solution3
1 2016-04-03 16:14:38

solution4
0 2016-04-03 06:57:11

solution5
0 2016-04-03 07:47:21