简体   繁体   English

将函数应用于pandas数据帧

[英]Applying a function to pandas dataframe

I'm trying to perform some text analysis on a pandas dataframe , but am having some trouble with the flow. 我正在尝试对pandas dataframe执行一些文本分析,但是我遇到了一些问题。 Alternatively, maybe I just not getting it... PS - I'm a python beginner-ish. 或者,也许我只是没有得到它... PS - 我是一个蟒蛇初学者 - 是的。

Dataframe example: 数据帧示例:

df = pd.DataFrame({'Document' : ['a','1','a', '6','7','N'], 'Type' : ['7', 'E', 'Y', '6', 'C', '9']})


     Document   Type
0    a          7
1    1          E
2    a          Y
3    6          6
4    7          C
5    N          9

I'm trying to build a flow that if 'Document' or 'Type' is a number or not, do something. 我正在尝试建立一个流程,如果'Document'或'Type'是一个数字或不是,做一些事情。

Here is a simple function to return whether 'Document' is a number (edited to show how I am trying some if/then flow on the field): 这是一个简单的函数,用于返回'Document'是否为数字(编辑以显示我如何在场上尝试一些if / then flow):

def fn(dfname):
    if dfname['Document'].apply(str.isdigit):
        dfname['Check'] = 'Y'
    else:
        dfname['Check'] = 'N'

Now, I apply it to the dataframe: 现在,我apply它应用于数据帧:

df.apply(fn(df), axis=0)

I get this error back: 我收到此错误:

TypeError: ("'NoneType' object is not callable", u'occurred at index Document')

From the error message, it looks that I am not handling the index correctly. 从错误消息,它看起来我没有正确处理索引。 Can anyone see where I am going wrong? 任何人都可以看到我错在哪里?

Lastly - this may or may not be related to the issue, but I am really struggling with how indexes work in pandas . 最后-这可能会或可能不会进行相关的问题,但我真的挣扎如何indexes在工作pandas I think I have run into more issues with the index than any other issue. 我认为我遇到的索引问题比任何其他问题都多。

You're close. 你很亲密

The thing you have to realize about apply is you need to write functions that operate on scalar values and return the result that you want. 你需要了解的关于apply的事情是你需要编写对标量值进行操作的函数并返回你想要的结果。 With that in mind: 考虑到这一点:

import pandas as pd

df = pd.DataFrame({'Document' : ['a','1','a', '6','7','N'], 'Type' : ['7', 'E', 'Y', '6', 'C', '9']})

def fn(val):
    if str(val).isdigit():
        return 'Y'
    else:
        return 'N'

df['check'] = df['Document'].apply(fn)

gives me: 给我:

  Document Type check
0        a    7     N
1        1    E     Y
2        a    Y     N
3        6    6     Y
4        7    C     Y
5        N    9     N

Edit: 编辑:

Just want to clarify that when using apply on a series, you should write function that accept scalar values. 只是想澄清一下,当在一个系列上使用apply时,你应该编写接受标量值的函数。 When using apply on a DataFrame, however, the functions should accept either full columns (when axis=0 -- the default) or full rows (when axis=1 ). 但是,在DataFrame上使用apply时,函数应接受完整列(当axis=0 - 默认值时)或完整行(当axis=1 )。

It's worth noting that you can do this (without using apply, so more efficiently) using str.contains : 值得注意的是,您可以使用str.contains执行此操作(不使用apply,因此更有效):

In [11]: df['Document'].str.contains('^\d+$')
Out[11]: 
0    False
1     True
2    False
3     True
4     True
5    False
Name: Document, dtype: bool

Here the regex ^ and $ mean start and end respectively. 这里正则表达式^和$分别表示开始和结束。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM