简体   繁体   English

如何将函数应用于 Pandas 中需要行索引的数据帧?

[英]How can I apply a function to a dataframe which needs a row index in Pandas?

I have to use survey data from ipums to get the average number of people who are unemployed in two successive periods.我必须使用 ipums 的调查数据来获得连续两个时期失业的平均人数。 I wrote a function that uses an index and a dataframe as input,我写了一个使用索引和数据框作为输入的函数,

def u1(x,df):
if df.loc[x]['LABFORCE']==2 and df.loc[x]['CPSIDP']==df.loc[x+1]['CPSIDP']:
    if df.loc[x]['EMPSTAT']==21 or df.loc[x]['EMPSTAT']==22:
        return True
else: 
    return False

where x is the index and df is the dataframe.其中x是索引, df是数据帧。 CPSIDP identifies the survey respondent, LABFORCE checks the respondent is in the labor force and EMPSTAT is what I need to use to check the employment status of the respondent. CPSIDP识别调查受访者, LABFORCE检查受访者是否在劳动力中,而EMPSTAT是我需要用来检查受访者就业状况的工具。

And then I planned to use apply as然后我计划使用apply as

result= df.apply(u1, axis=1)

It is not clear what arguments I should pass in my function (and please let me know if this approach is just philosophically wrong).目前尚不清楚我应该在我的函数中传递什么参数(如果这种方法在哲学上是错误的,请告诉我)。 Passing a number or a variable for the index gives me a 'bool' object is not callable error.为索引传递数字或变量会给我一个“bool”对象不可调用错误。

The smallest dataframe subset that generates the error (left most column is the number of the observation, it is the x I need to pass through u1 ):产生错误的最小数据帧子集(最左边的列是观察的数量,它是我需要通过u1x ):

          YEAR  MONTH          CPSIDP  EMPSTAT  LABFORCE
15285896  2018      7  20180707096701       10         2
15285926  2018      7  20180707098301       10         2
15285927  2018      7  20180707098302       10         2
15285928  2018      7  20180707098303        0         0
15285929  2018      7  20180707098304        0         0
15285930  2018      7  20180707098305       10         2
15286095  2018      7  20180707108203       21         2

IIUC it would be more efficient to create a boolean Series using the logic from your function. IIUC 使用函数中的逻辑创建boolean Series会更有效。

Here & is the AND operator.这里&AND运算符。

result = (df['LABFORCE'].eq(2) & 
           df['CPSIDP'].eq(df['CPSIDP'].shift()) & 
           df['EMPSTAT'].isin([21,22]))

result

15285896    False
15285926    False
15285927    False
15285928    False
15285929    False
15285930    False
15286095    False

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 function 应用于 pandas dataframe 中的每一行? - How can I apply a function to each row in a pandas dataframe? 如何从Pandas DataFrame中按索引值检索行? - How can I retrieve a row by index value from a Pandas DataFrame? 哪个 pandas DataFrame 行在应用 function 时发出警告? - Which pandas DataFrame row raised a warning with apply function? 如何将 function 应用于具有重复索引的 dataframe - How can I apply a function to a dataframe with repetitive index In Pandas, how do I apply a function to a row of a dataframe, where each item in the row should be passed to the function as an argument? - In Pandas, how do I apply a function to a row of a dataframe, where each item in the row should be passed to the function as an argument? 如何将 function 应用到 Pandas 中的一系列其他行? - How can I apply a function to every other row in a series in Pandas? 如何将函数应用于熊猫数据框? - How do I apply a function to a pandas dataframe? Pandas - 获取 pandas 中一行的索引应用 function - Pandas - getting an index of a row in a pandas apply function pandas,应用参数为 dataframe 行条目 - pandas, apply with args which are dataframe row entries 如何在数据框中分解列,该数据框中具有需要在 Pandas 中分解的每一行的完整列表? - How to explode columns in a dataframe which has the complete list for every row which needs to be exploded in pandas?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM