简体   繁体   English

将 function 应用于 python 中的数据帧列表

[英]applying function to list of dataframes in python

beginner python question here that I've had struggles getting answered from related stack questions.初学者 python 问题,我一直在努力从相关的堆栈问题中得到答案。

I've got a list我有一个清单

dfList = df0,df1,df2,...,df7

I've got a function that I've defined and takes a dataframe as its argument.我已经定义了一个 function 并将 dataframe 作为其参数。 I'm not sure the function itself matters, but to be safe it is basically我不确定 function 本身是否重要,但为了安全起见,它基本上是

def rateCalc (outcomeDataFrame):
    rateList = list()
    upperRateList = list()
    lowerRateList = list()
    for i in range(len(outcomeDataFrame)):
        lowlevel, highlevel = proportion_confint(count=outcomeDataFrame.iloc[i,4], nobs=outcomeDataFrame.iloc[i,3])
        lowerRateList.append(lowlevel)
        rateList.append(outcomeDataFrame.iloc[i,4]/outcomeDataFrame.iloc[i,3])
        upperRateList.append(highlevel)

    outcomeDataFrame = outcomeDataFrame.assign(lowerRate=lowerRateList)
    outcomeDataFrame = outcomeDataFrame.assign(midrate=rateList)
    outcomeDataFrame = outcomeDataFrame.assign(upperRate=upperRateList)

    return outcomeDataFrame

What I'm trying to do is append a the observed success ratio of two numbers as well as their 95% confidence interval.我想要做的是 append 两个数字的观察成功率以及它们的 95% 置信区间。 Goes fine when working with any individual df.与任何个人 df 一起工作时都很好。

What I want to accomplish is turn each item of dfList into a version of itself with those lowerRate, midRate, and higherRate values appended as new columns.我想要完成的是将 dfList 的每个项目变成其自身的一个版本,并将那些 lowerRate、midRate 和 higherRate 值附加为新列。

When I try to apply across each dataframe with当我尝试跨每个 dataframe 应用时

for i in range(len(dfList):
   rateCalc(dfList[i])

though, it seems to only execute for df0.不过,它似乎对 df0 执行。 I can't make any sense of that;我无法理解这一点; a full error I'd assume I had some basic flaw in the code, but it seems to work for df0 and then not iterate to df1 and beyond.一个完整的错误我假设我在代码中有一些基本缺陷,但它似乎适用于 df0,然后不会迭代到 df1 及以后。

I also thought there may be an issue of "df1,= dfList[1]" in some backend sense (that running the function on the item in a list dfList[1] would not have any affect on the original item df1) but, again, the fact it seems to work with df0 would imply that's not the issue.我还认为在某种后端意义上可能存在“df1,= dfList[1]”问题(在列表 dfList[1] 中的项目上运行 function 不会对原始项目 df1 产生任何影响)但是,同样,它似乎与 df0 一起工作的事实意味着这不是问题所在。

I also tried throwing some mud at the wall with the "map" function but am not sure I understand how to use that in this context (or any other for that matter ha)我还尝试用“地图”function 在墙上扔一些泥,但我不确定我是否理解如何在这种情况下使用它(或任何其他与此相关的东西哈)

Thanks all谢谢大家

I think it is because the assing function returns another Data Frame which only exists inside the function scope, here is an example我认为这是因为 assing function 返回另一个仅存在于 function scope 内部的数据帧,这里是一个例子

import pandas as pd
df_0 = pd.DataFrame(data = [{'column':'a'}])
df_1 = pd.DataFrame(data = [{'column':'c'}])
df_2 = pd.DataFrame(data = [{'column':'d'}])
df_altos = df_0,df_1,df_2

def mod_df(df):
    test = list()
    test.append('d')
    #print('id before setting another column '+str(id(df)))
    #df['b'] = test
    print('id before assinging '+str(id(df)))
    df = df.assign(lowerRate = test)
    print('id after  assinging '+str(id(df)))
    return df

for i in range(len(df_altos)):
    mod_df(df_altos[i])

The returning id of each dataframe is the following每个dataframe的返回id如下

id before assinging 1833832455136
id after  assinging 1833832523568
id before assinging 1833832456144
id after  assinging 1833832525776
id before assinging 1833832454416
id after  assinging 1833832521888

As you can see, the id changes.如您所见,id 发生了变化。 You could try another atribution method, as the following您可以尝试另一种归因方法,如下所示

def mod_df(df):
    test = list()
    test.append('d')
    print('id before setting another column '+str(id(df)))
    df['b'] = test
    print('id after assinging '+str(id(df)))
    return df

which outputs哪个输出

id before setting another column 1833831955520
id after assinging 1833831955520
id before setting another column 1833791973888
id after assinging 1833791973888
id before setting another column 1833791973264
id after assinging 1833791973264

Now the ids are the same and the new column exists on all the dataframes.现在 ID 相同,新列存在于所有数据框中。 How the first dataframe of you code was working i dont know.我不知道您的第一个 dataframe 代码是如何工作的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM