How do you get two functions to return when using a user-defined function?

Question

I am just starting to use user-defined functions, so this is probably not a very complex question, forgive me.

I have a few dataframes, which all have a column named 'interval_time' (for example) and I would like to rename this column 'Timestamp', and then make this renamed column into the index.

I know that I can do this manually with this;

df = df.rename(index=str, columns={'interval_time': 'Timestamp'})
df = df.set_index('Timestamp')

but now I would like to define a function called rename that does this for me. I have seen that this works;

def rename_col(data, col_in='tempus_interval_time', col_out='Timestamp'):
    return data.rename(index=str, columns={col_in: col_out}, inplace=True)

but when I try to add the second function it does not seem to do anything, but if I define the second part as its own function and run it it does seem to work.

I am trying this

def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'):
    return data.rename(index=str, columns={col_in: col_out}, inplace=True)
    return data.set_index('Timestamp', inplace=True)

The dataframes that I am using have the following form;

df_scada
              interval_time                 A         ...             X                 Y 
0       2010-11-01 00:00:00                0.0        ...                396.36710         381.68860
1       2010-11-01 00:05:00                0.0        ...                392.97974         381.40634
2       2010-11-01 00:10:00                0.0        ...                390.15695         379.99493
3       2010-11-01 00:15:00                0.0        ...                389.02786         379.14810

Answer 1

You don't need to return anything , because your operations are done in place . You can do the in-place changes in your function:

def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'):
    data.rename(index=str, columns={col_in: col_out}, inplace=True)
    data.set_index('Timestamp', inplace=True)

and any other references to the dataframe you pass into the function will see the changes made:

>>> import pandas as pd
>>> df = pd.DataFrame({'interval_time': pd.to_datetime(['2010-11-01 00:00:00', '2010-11-01 00:05:00', '2010-11-01 00:10:00', '2010-11-01 00:15:00']),
...     'A': [0.0] * 4}, index=range(4))
>>> df
     A       interval_time
0  0.0 2010-11-01 00:00:00
1  0.0 2010-11-01 00:05:00
2  0.0 2010-11-01 00:10:00
3  0.0 2010-11-01 00:15:00
>>> def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'):
...     data.rename(index=str, columns={col_in: col_out}, inplace=True)
...     data.set_index('Timestamp', inplace=True)
...
>>> rename_n_index(df, 'interval_time')
>>> df
                       A
Timestamp
2010-11-01 00:00:00  0.0
2010-11-01 00:05:00  0.0
2010-11-01 00:10:00  0.0
2010-11-01 00:15:00  0.0

In the above example, the df reference to the dataframe shows the changes made by the function.

If you remove the inplace=True arguments, the method calls return a new dataframe object. You can store an intermediate result as a local variable, then apply the second method to the dataframe referenced in that local variable:

def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'):
    renamed = data.rename(index=str, columns={col_in: col_out})
    return renamed.set_index('Timestamp')

or you can chain the method calls directly to the returned object:

def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'):
    return data.rename(index=str, columns={col_in: col_out})\
               .set_index('Timestamp'))

Because renamed is already a new dataframe, you can apply the set_index() call in-place to that object, then return just renamed , as well:

def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'):
    renamed = data.rename(index=str, columns={col_in: col_out})
    renamed.set_index('Timestamp', inplace=True)
    return renamed

Either way, this returns a new dataframe object, leaving the original dataframe unchanged:

>>> def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'):
...     renamed = data.rename(index=str, columns={col_in: col_out})
...     return renamed.set_index('Timestamp')
...
>>> df = pd.DataFrame({'interval_time': pd.to_datetime(['2010-11-01 00:00:00', '2010-11-01 00:05:00', '2010-11-01 00:10:00', '2010-11-01 00:15:00']),
...     'A': [0.0] * 4}, index=range(4))
>>> rename_n_index(df, 'interval_time')
                       A
Timestamp
2010-11-01 00:00:00  0.0
2010-11-01 00:05:00  0.0
2010-11-01 00:10:00  0.0
2010-11-01 00:15:00  0.0
>>> df
     A       interval_time
0  0.0 2010-11-01 00:00:00
1  0.0 2010-11-01 00:05:00
2  0.0 2010-11-01 00:10:00
3  0.0 2010-11-01 00:15:00

Answer 2

See @MartijnPieters' explanation for resolving the errors in your code.

However, note that the Pandorable method is to use method chaining. Some find it aesthetically pleasing to see method names visually aligned. Here's an example:

def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'):

    renamed = data.rename(index=str, columns={col_in: col_out})\
                  .set_index('Timestamp')

    return renamed

Then to apply these to a sequence of dataframes as in your previous question :

dfs = [df.pipe(rename_n_index) for df in (df1, df2, df3)]

How do you get two functions to return when using a user-defined function?

Question

2 answers

solution1
4 2018-07-06 14:44:40

solution2
2

How do you get two functions to return when using a user-defined function?

Question

2 answers

solution1 4 2018-07-06 14:44:40

solution2 2

solution1
4 2018-07-06 14:44:40

solution2
2