I am just starting to use user-defined functions, so this is probably not a very complex question, forgive me.
I have a few dataframes, which all have a column named 'interval_time' (for example) and I would like to rename this column 'Timestamp', and then make this renamed column into the index.
I know that I can do this manually with this;
df = df.rename(index=str, columns={'interval_time': 'Timestamp'})
df = df.set_index('Timestamp')
but now I would like to define a function called rename that does this for me. I have seen that this works;
def rename_col(data, col_in='tempus_interval_time', col_out='Timestamp'):
return data.rename(index=str, columns={col_in: col_out}, inplace=True)
but when I try to add the second function it does not seem to do anything, but if I define the second part as its own function and run it it does seem to work.
I am trying this
def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'):
return data.rename(index=str, columns={col_in: col_out}, inplace=True)
return data.set_index('Timestamp', inplace=True)
The dataframes that I am using have the following form;
df_scada
interval_time A ... X Y
0 2010-11-01 00:00:00 0.0 ... 396.36710 381.68860
1 2010-11-01 00:05:00 0.0 ... 392.97974 381.40634
2 2010-11-01 00:10:00 0.0 ... 390.15695 379.99493
3 2010-11-01 00:15:00 0.0 ... 389.02786 379.14810
You don't need to return anything , because your operations are done in place . You can do the in-place changes in your function:
def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'):
data.rename(index=str, columns={col_in: col_out}, inplace=True)
data.set_index('Timestamp', inplace=True)
and any other references to the dataframe you pass into the function will see the changes made:
>>> import pandas as pd
>>> df = pd.DataFrame({'interval_time': pd.to_datetime(['2010-11-01 00:00:00', '2010-11-01 00:05:00', '2010-11-01 00:10:00', '2010-11-01 00:15:00']),
... 'A': [0.0] * 4}, index=range(4))
>>> df
A interval_time
0 0.0 2010-11-01 00:00:00
1 0.0 2010-11-01 00:05:00
2 0.0 2010-11-01 00:10:00
3 0.0 2010-11-01 00:15:00
>>> def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'):
... data.rename(index=str, columns={col_in: col_out}, inplace=True)
... data.set_index('Timestamp', inplace=True)
...
>>> rename_n_index(df, 'interval_time')
>>> df
A
Timestamp
2010-11-01 00:00:00 0.0
2010-11-01 00:05:00 0.0
2010-11-01 00:10:00 0.0
2010-11-01 00:15:00 0.0
In the above example, the df
reference to the dataframe shows the changes made by the function.
If you remove the inplace=True
arguments, the method calls return a new dataframe object. You can store an intermediate result as a local variable, then apply the second method to the dataframe referenced in that local variable:
def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'):
renamed = data.rename(index=str, columns={col_in: col_out})
return renamed.set_index('Timestamp')
or you can chain the method calls directly to the returned object:
def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'):
return data.rename(index=str, columns={col_in: col_out})\
.set_index('Timestamp'))
Because renamed
is already a new dataframe, you can apply the set_index()
call in-place to that object, then return just renamed
, as well:
def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'):
renamed = data.rename(index=str, columns={col_in: col_out})
renamed.set_index('Timestamp', inplace=True)
return renamed
Either way, this returns a new dataframe object, leaving the original dataframe unchanged:
>>> def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'):
... renamed = data.rename(index=str, columns={col_in: col_out})
... return renamed.set_index('Timestamp')
...
>>> df = pd.DataFrame({'interval_time': pd.to_datetime(['2010-11-01 00:00:00', '2010-11-01 00:05:00', '2010-11-01 00:10:00', '2010-11-01 00:15:00']),
... 'A': [0.0] * 4}, index=range(4))
>>> rename_n_index(df, 'interval_time')
A
Timestamp
2010-11-01 00:00:00 0.0
2010-11-01 00:05:00 0.0
2010-11-01 00:10:00 0.0
2010-11-01 00:15:00 0.0
>>> df
A interval_time
0 0.0 2010-11-01 00:00:00
1 0.0 2010-11-01 00:05:00
2 0.0 2010-11-01 00:10:00
3 0.0 2010-11-01 00:15:00
See @MartijnPieters' explanation for resolving the errors in your code.
However, note that the Pandorable method is to use method chaining. Some find it aesthetically pleasing to see method names visually aligned. Here's an example:
def rename_n_index(data, col_in='tempus_interval_time', col_out='Timestamp'):
renamed = data.rename(index=str, columns={col_in: col_out})\
.set_index('Timestamp')
return renamed
Then to apply these to a sequence of dataframes as in your previous question :
dfs = [df.pipe(rename_n_index) for df in (df1, df2, df3)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.