简体   繁体   中英

Applying a function to pandas dataframe

I'm trying to perform some text analysis on a pandas dataframe , but am having some trouble with the flow. Alternatively, maybe I just not getting it... PS - I'm a python beginner-ish.

Dataframe example:

df = pd.DataFrame({'Document' : ['a','1','a', '6','7','N'], 'Type' : ['7', 'E', 'Y', '6', 'C', '9']})


     Document   Type
0    a          7
1    1          E
2    a          Y
3    6          6
4    7          C
5    N          9

I'm trying to build a flow that if 'Document' or 'Type' is a number or not, do something.

Here is a simple function to return whether 'Document' is a number (edited to show how I am trying some if/then flow on the field):

def fn(dfname):
    if dfname['Document'].apply(str.isdigit):
        dfname['Check'] = 'Y'
    else:
        dfname['Check'] = 'N'

Now, I apply it to the dataframe:

df.apply(fn(df), axis=0)

I get this error back:

TypeError: ("'NoneType' object is not callable", u'occurred at index Document')

From the error message, it looks that I am not handling the index correctly. Can anyone see where I am going wrong?

Lastly - this may or may not be related to the issue, but I am really struggling with how indexes work in pandas . I think I have run into more issues with the index than any other issue.

You're close.

The thing you have to realize about apply is you need to write functions that operate on scalar values and return the result that you want. With that in mind:

import pandas as pd

df = pd.DataFrame({'Document' : ['a','1','a', '6','7','N'], 'Type' : ['7', 'E', 'Y', '6', 'C', '9']})

def fn(val):
    if str(val).isdigit():
        return 'Y'
    else:
        return 'N'

df['check'] = df['Document'].apply(fn)

gives me:

  Document Type check
0        a    7     N
1        1    E     Y
2        a    Y     N
3        6    6     Y
4        7    C     Y
5        N    9     N

Edit:

Just want to clarify that when using apply on a series, you should write function that accept scalar values. When using apply on a DataFrame, however, the functions should accept either full columns (when axis=0 -- the default) or full rows (when axis=1 ).

It's worth noting that you can do this (without using apply, so more efficiently) using str.contains :

In [11]: df['Document'].str.contains('^\d+$')
Out[11]: 
0    False
1     True
2    False
3     True
4     True
5    False
Name: Document, dtype: bool

Here the regex ^ and $ mean start and end respectively.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM