简体   繁体   中英

How to find rows with column values having a particular datatype in a Pandas DATAFRAME

I have a dataframe

name    col1
satya    12
satya    abc
satya    109.12
alex     apple
alex     1000

So now i need to display the rows where column 'col1' has int value in it.O/p looks like

name    col1
satya    12
alex     1000

if search for string value

name    col1
satya    abc
alex     apple

Like wise..please suggest some code lines(may be using reg).

Let's start with a simple regex that will evaluate to True if you have an integer and False otherwise:

import re
regexp = re.compile('^-?[0-9]+$')
bool(regexp.match('1000'))
True
bool(regexp.match('abc'))
False

Once you have such a regex you can proceed as follows:

mask = df['col1'].map(lambda x: bool(regexp.match(x)) )
df.loc[mask]

    name    col1
0   satya   12
4   alex    1000

To search for strings you'll do:

regexp_str = re.compile('^[a-zA-Z]+$')
mask_str = df['col1'].map(lambda x: bool(regexp_str.match(x)))
df.loc[mask_str]

    name    col1
1   satya   abc
3   alex    apple

EDIT

The above code would work if dataframe were created by:

df = pd.read_clipboard()

(or, alternatively, all variables were supplied as strings).

If the regex approach works depends on how the df was created. Eg, if it were created with:

df = pd.DataFrame({'name': ['satya','satya','satya', 'alex', 'alex'],
                   'col1': [12,'abc',109.12,'apple',1000] },
                   columns=['name','col1'])

the above code would fail with TypeError: expected string or bytes-like object

To make it work in any case, one would need to explicitly coerce type to str :

mask = df['col1'].astype('str').map(lambda x: bool(regexp.match(x)) )
df.loc[mask]

    name    col1
0   satya   12
4   alex    1000

and the same for strings:

regexp_str = re.compile('^[a-zA-Z]+$')
mask_str = df['col1'].astype('str').map(lambda x: bool(regexp_str.match(x)))
df.loc[mask_str]

    name    col1
1   satya   abc
3   alex    apple

EDIT2

To find a float:

regexp_float = re.compile('^[-\+]?[0-9]*(\.[0-9]+)$')
mask_float = df['col1'].astype('str').map(lambda x: bool(regexp_float.match(x)))
df.loc[mask_float]

    name    col1
2   satya   109.12

In pandas you would do something like this:

mask = df.col1.apply(lambda x: type(x) == int)
print df[mask]

Which would yield your expected output.

You can check whether the value contains only digits:

In [104]: df
Out[104]:
    name    col1
0  satya      12
1  satya     abc
2  satya  109.12
3   alex   apple
4   alex    1000

Integers:

In [105]: df[~df.col1.str.contains(r'\D')]
Out[105]:
    name  col1
0  satya    12
4   alex  1000

Non-integers:

In [106]: df[df.col1.str.contains(r'\D')]
Out[106]:
    name    col1
1  satya     abc
2  satya  109.12
3   alex   apple

if you want to filter all numeric values (integers/float/decimal) you can use pd.to_numeric(..., errors='coerce') :

In [75]: df
Out[75]:
    name    col1
0  satya      12
1  satya     abc
2  satya  109.12
3   alex   apple
4   alex    1000

In [76]: df[pd.to_numeric(df.col1, errors='coerce').notnull()]
Out[76]:
    name    col1
0  satya      12
2  satya  109.12
4   alex    1000

In [77]: df[pd.to_numeric(df.col1, errors='coerce').isnull()]
Out[77]:
    name   col1
1  satya    abc
3   alex  apple
def is_integer(element):
    try:
        int(element) #if this is str then there will be error
        return 1
    except:
        return 0

You can simply define a function as below then list your items with for loop.

def list_str(list_of_data):
    str_list=[]
    for item in list_of_data: #list_of_data = [[names],[col1s]] if just col1s replace item[2] with item[1]
        if not is_integer(item[2]):
            str_list.append(item)
    return str_list

def list_int(list_of_data):
    int_list=[]
    for item in list_of_data:
        if is_integer(item[2]):
            int_list.append(item)
    return int_list

Hope this can help you

You can use df.applymap(np.isreal)

df = pd.DataFrame({'col1': [12,'abc',109.12,'apple',1000], 'name': ['satya','satya','satya', 'alex', 'alex']})
df
col1    name
0   12  satya
1   abc     satya
2   109.12  satya
3   apple   alex
4   1000    alex

df2 = df[df.applymap(np.isreal)]
df2
col1    name
0   12  NaN
1   NaN     NaN
2   109.12  NaN
3   NaN     NaN
4   1000    NaN

df2 = df2[df2.col1.notnull()]
df2
col1    name
0   12  NaN
2   109.12  NaN
4   1000    NaN

index_list = df2.index.tolist()
index_list
[0, 2, 4]

df = df.iloc[index_list]
df
col1    name
0   12  satya
2   109.12  satya
4   1000    alex

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM