简体   繁体   English

python dataframe as function input and get another dataframe with new name as output

[英]python dataframe as function input and get another dataframe with new name as output

I have a dataframe df with lots of processing on different rows and columns.我有一个 dataframe df ,在不同的行和列上进行了大量处理。 Eventually I'd like to get a new df called eg processed_df .最终,我想获得一个新的 df ,称为例如processed_df This is what I have done:这就是我所做的:

import pandas as pd
import numpy as np

def foofunc(df):
    name =[x for x in globals() if globals()[x] is df][0] # get df name as string
    output_df='processed_'+str(name)
    
    output_df=df.head(2) # e.g as process, in reality is ~ 50 operations
    print(f'output dataframe name is: {str(output_df)})') #expect to get: processed_df
    return output_df

testdf = pd.DataFrame(np.random.randint(0,100,size=(5, 2)), columns=list('AB'))
foofunc(testdf) # expect to get processed_testdf

processed_df

Then here in the last line, I get the error:然后在最后一行,我得到了错误:

NameError: name 'processed_df' is not defined

To be more clear, this is part of a pipeline, so I'd like just to give a df and get out the processed with a desired name.更清楚地说,这是管道的一部分,所以我只想给出一个 df 并使用所需的名称来处理。 In general, is my approach a good practice to do such operations on dataframes?一般来说,我的方法是对数据帧进行此类操作的好习惯吗?

Thank you!谢谢!

I don't see a good reason to have a function auto-generate a name and put its result into the global namespace, when python already binds function results to names.当 python 已经将 function 结果绑定到名称时,我认为没有充分的理由让 function 自动生成名称并将其结果放入全局命名空间。 After that name has been generated, how would another piece of code know what it is called?生成该名称后,另一段代码如何知道它的名称? And suppose that input df wasn't in the function's global namespace and its global name (or one of its global names if it has multiple references) can't be found?并假设输入df不在函数的全局命名空间中,并且找不到它的全局名称(或者如果它有多个引用,则为它的全局名称之一)?

There are many ways to write a pipeline, the easiest being有很多方法可以编写管道,最简单的是

df = do_thing_1(df)
df = do_thing_2(df)
...

This has the advantage that the caller gets to decide the name.这样做的好处是调用者可以决定名称。 And it gets rid of intermediate dataframes that are otherwise consuming memory.并且它摆脱了消耗 memory 的中间数据帧。

That said, your problem is that you don't assign the result back to the global namespace... and you use the wrong name for the generated dataframe (getting back to that "how do you know what the name is" problem).也就是说,您的问题是您没有将结果分配回全局名称空间......并且您为生成的 dataframe 使用了错误的名称(回到“你怎么知道名字是什么”问题)。 A solution is一个解决方案是

import pandas as pd
import numpy as np

def foofunc(df):
    name =[x for x in globals() if globals()[x] is df][0] # get df name as string
    output_df_name='processed_'+str(name)
    
    output_df=df.head(2) # e.g as process, in reality is ~ 50 operations
    print(f'output dataframe name is: {str(output_df)})') #expect to get: processed_df
    globals()[output_df_name] = output_df

testdf = pd.DataFrame(np.random.randint(0,100,size=(5, 2)), columns=list('AB'))
foofunc(testdf) # expect to get processed_testdf

processed_testdf

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 dataframe 名称的用户输入字符串获取 pandas dataframe - Get pandas dataframe with user input string of dataframe name for a function Python:将函数应用于DataFrame以从新的计算列中获取输入 - Python: Applying a function to DataFrame taking input from the new calculated column 使用一个函数的输出作为另一个新编码函数python的输入 - using the output of a function as the input in another function python new to coding Calling a Python function/class that takes an entire pandas dataframe or series as input, for all rows in another dataframe - Calling a Python function/class that takes an entire pandas dataframe or series as input, for all rows in another dataframe 将字符串添加到数据框名称作为函数输出 - Add string to dataframe name as function output 通过将函数应用于另一个数据框的列来创建新的数据框 - Create a new dataframe by applying function to columns of another dataframe 使用来自另一个数据框的新数据更新python数据框 - Updating a python dataframe with new data from another dataframe 在 Pandas ZA7F5F35426B927411FC9231B56382 - Problem creating new Dataframe that depends on another Dataframe in Pandas Python python-基于与另一个数据框的比较的新列 - python - New column based on comparison with another dataframe Python map()函数输出到Pandas DataFrame中 - Python map() function output into Pandas DataFrame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM