简体   繁体   中英

Passing a python dataframe to an object and altering the dataframe

I am new to python and I am trying to pass an argument (dataframe) to a function and change value of the argument (dataframe) by reading an excel file. (Assume that I have imported all the necessary files)

I have noticed that python does not pass the argument by reference here and I end up not having the dataframe initialized/changed.

I read that python passes by object-reference and not by value or reference. However, I do not need to change the same dataframe.

The output is : class 'pandas.core.frame.DataFrame'>

from pandas import DataFrame as df
class Data:
   x = df

   @staticmethod
   def import_File(df_name , file):
       df_name  = pd.io.excel.read_excel(file.replace('"',''), sheetname='Sheet1', header=0, skiprows=None, skip_footer=0, index_col=None, parse_cols=None, parse_dates=True, date_parser=True, na_values=None, thousands=None, convert_float=True, has_index_names=None, converters=None, engine=None )


def inputdata():
    Data.import_File(Data.x,r"C:\Users\Data\try.xlsx")
    print(Data.x)

You seem to be doing a lot of things the hard way. I'll try to simplify it while conforming to standard patterns of use.

# Whatever imports you need
import pandas as pd


# Static variables and methods should generally be avoided.
# Change class and variable names to whatever is more suitable.
# Names should be meaningful when possible.
class MyData:

    # Load data in constructor. Could easily do this in another method.
    def __init__(self, filename):
        self.data = pd.io.excel.read_excel(filename, sheetname='Sheet1')


def inputData():
    # In my experience, forward slashes work just fine on Windows.
    # Create new MyData object using constructor
    x = MyData('C:/Users/Data/try.xlsx')

    # Access member variable from object
    print(x.data)

Here's the version where it loads in a method rather than the constructor.

import pandas as pd


class MyData:

    # Constructor
    def __init__(self):
        # Whatever setup you need
        self.data = None
        self.loaded = False

    # Method with optional argument
    def loadFile(self, filename, sheetname='Sheet1')
        self.data = pd.io.excel.read_excel(filename, sheetname=sheetname)
        self.loaded = True


def inputData():
    x = MyData()
    x.loadFile('C:/Users/Data/try.xlsx')
    print(x.data)

    # load some other data, using sheetname 'Sheet2' instead of default
    y = MyData()
    y.loadFile('C:/Users/Data/tryagain.xlsx', 'Sheet2')
    # can also pass arguments by name in any order like this:
    # y.loadFile(sheetname='Sheet2', filename='C:/Users/Data/tryagain.xlsx')
    print(y.data)

    # x and y both still exist with different data.
    # calling x.loadFile() again with a different path will overwrite its data.

The reason why it doesn't save in your original code is because assigning values to argument names never changes the original variable in Python. What you can do is something like this:

# Continuing from the last code block

def loadDefault(data):
    data.loadFile('C:/Users/Data/try.xlsx')

def testReference():
    x = MyData()
    loadDefault(x)
    # x.data now has been loaded
    print(x.data)


# Another example

def setIndex0(variable, value):
    variable[0] = value

def testSetIndex0():
    v = ['hello', 'world']
    setIndex0(v, 'Good morning')
    # v[0] now equals 'Good morning'
    print(v[0])

But you can't do this:

def setString(variable, value):
    # The only thing this changes is the value of variable inside this function.
    variable = value

def testSetString():
    v = 'Start'
    setString(v, 'Finish')
    # v is still 'Start'
    print(v)

If you want to be able to specify the location to store a value using a name, you could use a data structure with indexes/keys. Dictionaries let you access and store values using a key.

import pandas as pd


class MyData:

    # Constructor
    def __init__(self):
        # make data a dictionary
        self.data = {}

    # Method with optional argument
    def loadFile(self, storename, filename, sheetname='Sheet1')
        self.data[storename] = pd.io.excel.read_excel(filename, sheetname=sheetname)

    # Access method
    def getData(self, name):
        return self.data[name]


def inputData():
    x = MyData()
    x.loadFile('name1', 'C:/Users/Data/try.xlsx')
    x.loadFile('name2', 'C:/Users/Data/tryagain.xlsx', 'Sheet2')

    # access Sheet1
    print(x.getData('name1'))

    # access Sheet2
    print(x.getData('name2'))

If you really want the function to be static, then you don't need to make a new class at all. The main reason for creating a class is to use it as a reusable structure to hold data with methods specific to that data.

import pandas as pd

# wrap read_excel to make it easier to use
def loadFile(filename, sheetname='Sheet1'):
    return pd.io.excel.read_excel(filename, sheetname=sheetname)

def inputData():
    x = loadFile('C:/Users/Data/try.xlsx')
    print(x)

    # the above is exactly the same as
    x = pd.io.excel.read_excel('C:/Users/Data/try.xlsx', sheetname='Sheet1')
    print(x)

In your code df is a class object. To create an empty data frame you need to instantiate it. Instantiating classes in Python uses function notation. Also, we don't need to pass the default parameters when we read the excel file. This will help the code look cleaner.
Also, we don't need to pass the default parameters when we read the excel file. This will help the code look cleaner.

from pandas import DataFrame as df
class Data:
    x = df()

    @staticmethod
    def import_File(df_name, file):
        df_name = pd.io.excel.read_excel(file.replace('"',''), sheetname='Sheet1')

When you pass Data.x to import_File() , df_name will refer to the same object as Data.x , which in this case is an empty dataframe. However, when you assign pd.io.excel.read_excel(file) to df_name then the connection between df_name and the empty dataframe is broken, and df_name now refers to the excel dataframe. Data.x has undergone no change during this process so it is still connected to for the empty data frame object.

A simpler way to see this with strings:

x = 'red'
df_name = x

We can break the df_name connection between string object 'red' and form a new one with object 'excel`.

df_name = 'excel'
print(x)
'red'

However, there's a simple fix for Data.x to return the excel dataframe.

from pandas import DataFrame as df
class Data:
   x = df()

   @staticmethod
   def import_File(file):
       Data.x = pd.io.excel.read_excel(file.replace('"',''), sheetname='Sheet1')

def inputdata():
    Data.import_File(r"C:\Users\Data\try.xlsx")
    print(Data.x)

However, I don't recommend using staticmethods, and you should include a constructor in your class as the other answer has recommended.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM