简体   繁体   English

在 python function 中返回一个 dataframe

[英]Returning a dataframe in python function

I am trying to create and return a data frame from a Python function我正在尝试从 Python function 创建并返回一个数据框

def create_df():
    data = {'state': ['Ohio','Ohio','Ohio','Nevada','Nevada'],
           'year': [2000,2001,2002,2001,2002],
           'pop': [1.5,1.7,3.6,2.4,2.9]}
    df = pd.DataFrame(data)
    return df
create_df()
df

I get an error that saying that df is not defined.我收到一条错误消息,指出未定义df If I replace return with print I get print of the data frame correctly.如果我用print替换return ,我会正确打印数据框。 Is there a way to do this?有没有办法做到这一点?

Wwhen you call create_df() , Python calls the function but doesn't save the result in any variable.当您调用create_df()时,Python 会调用该函数,但不会将结果保存在任何变量中。 That is why you got the error.这就是你得到错误的原因。

Assign the result of create_df() to a new variable df like this:create_df()的结果分配给一个新变量df ,如下所示:

df = create_df()
df

I'm kind of late here, but what about creating a global variable within the function?我在这里有点晚了,但是在函数中创建一个全局变量呢? It should save a step for you.它应该为您节省一步。

def create_df():

    global df

    data = {
    'state': ['Ohio','Ohio','Ohio','Nevada','Nevada'],
    'year': [2000,2001,2002,2001,2002],
    'pop': [1.5,1.7,3.6,2.4,2.9]
    }

    df = pd.DataFrame(data)

Then when you run create_df(), you'll be able to just use df.然后,当您运行 create_df() 时,您将能够只使用 df。

Of course, be careful in your naming strategy if you have a large program so that the value of df doesn't change as various functions execute.当然,如果您有一个大型程序,请注意命名策略,以便 df 的值不会随着各种函数的执行而改变。

EDIT: I noticed I got some points for this.编辑:我注意到我得到了一些积分。 Here's another (probably worse) way to do this using exec.这是使用 exec 执行此操作的另一种(可能更糟)方法。 This also allows for multiple dataframes to be created, if desired.如果需要,这还允许创建多个数据框。

import pandas as pd

def create_df():
    data = {'state': ['Ohio','Ohio','Ohio','Nevada','Nevada'],
           'year': [2000,2001,2002,2001,2002],
           'pop': [1.5,1.7,3.6,2.4,2.9]}
    df = pd.DataFrame(data)
    return df

### We'll create three dataframes for an example
for i in range(3):
    exec(f'df_{i} = create_df()')

Then, you can test them out:然后,您可以测试它们:

Input: df_0输入: df_0

Output:输出:

    state  year  pop
0    Ohio  2000  1.5
1    Ohio  2001  1.7
2    Ohio  2002  3.6
3  Nevada  2001  2.4
4  Nevada  2002  2.9

Input: df_1输入: df_1

Output:输出:

    state  year  pop
0    Ohio  2000  1.5
1    Ohio  2001  1.7
2    Ohio  2002  3.6
3  Nevada  2001  2.4
4  Nevada  2002  2.9

Etc.等等。

Function explicitly returns two DataFrames:函数显式返回两个 DataFrame:

import pandas as pd
import numpy as np

def return_2DF():

    date = pd.date_range('today', periods=20)
    DF1 = pd.DataFrame(np.random.rand(20, 2), index=date, columns=list('xyz'))

    DF2 = pd.DataFrame(np.random.rand(20, 4), index=date, columns='A B C D'.split())

    return DF1, DF2

Calling and returning two data frame调用并返回两个数据帧

one, two = return_2DF()

You can return dataframe from a function by making a copy of the dataframe like您可以通过复制数据框来从函数返回数据框,例如

def my_function(dataframe):
  my_df=dataframe.copy()
  my_df=my_df.drop(0)
  return(my_df)

new_df=my_function(old_df)
print(type(new_df))

Output: pandas.core.frame.DataFrame输出:pandas.core.frame.DataFrame

Dataframe_object.copy()

A deep copy needs to be performed to avoid issues of one dataframe being the reference to another dataframe.需要执行深层复制以避免一个数据帧成为另一个数据帧的引用的问题。 This is most crucial when you have a function in a module (or a separate file) returning a dataframe.当您在模块(或单独的文件)中有返回数据帧的函数时,这一点至关重要。 If you don't do return DataFrame_object.copy(), it will only return a reference to the dataframe created in the function.\如果您不返回 DataFrame_object.copy(),它只会返回对函数中创建的数据框的引用。\

If you are using a function in the same file, you might not even realize this issue of deep copy / shallow copy if you are using a global variable in the function.如果您在同一个文件中使用函数,如果您在函数中使用全局变量,您甚至可能不会意识到深拷贝/浅拷贝的问题。

I have come across this issue before but solved it really easily by setting a variable outside the function to be the output of the function.我以前遇到过这个问题,但是通过将 function 之外的变量设置为 function 的 output 来很容易地解决了这个问题。

def create_df():
    data = {'state': ['Ohio','Ohio','Ohio','Nevada','Nevada'],
           'year': [2000,2001,2002,2001,2002],
           'pop': [1.5,1.7,3.6,2.4,2.9]}
    df = pd.DataFrame(data)
    return df

df = create_df()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM