简体   繁体   中英

Apply function to data frame and make output a separate df pandas

I have a data frame

cat input.csv
dwelling,wall,weather,occ,height,temp
5,2,Ldn,Pen,154.7,23.4
5,4,Ldn,Pen,172.4,28.7
3,4,Ldn,Pen,183.5,21.2
3,4,Ldn,Pen,190.2,30.3

To which I'm trying to apply the following function:

input_df = pd.read_csv('input.csv')


def folder_column(row):
    if row['dwelling'] == 5 and row['wall'] == 2:
        return 'folder1'
    elif row['dwelling'] == 3 and row['wall'] == 4:
        return 'folder2'
    else:
        return 0

I want to run the function on the input dataset and store the output in a separate data frame using something like this:

temp_df = pd.DataFrame()
temp_df = input_df['archetype_folder'] = input_df.apply(folder_column, axis=1)

But when I do this I only get the newly created 'archetype_folder' in the temp_df, when I would like all the original columns from the input_df. Can anyone help? Note that I don't want to add the new column 'archetype_folder' to the original, input_df. I've also tried this:

temp_df = input_df

temp_df['archetype_folder'] = temp_df.apply(folder_column, axis=1)

But when I run the second command both input_df and temp_df end up with the new column?

Any help is appreciated!

You need to create copy of original DataFrame, then assign return values of your function to it, consider following simple example

import pandas as pd
def is_odd(row):
    return row.value % 2 == 1
df1 = pd.DataFrame({"value":[1,2,3],"name":["uno","dos","tres"]})
df2 = df1.copy()
df2["odd"] = df1.apply(is_odd,axis=1)
print(df1)
print("=====")
print(df2)

gives output

   value  name
0      1   uno
1      2   dos
2      3  tres
=====
   value  name    odd
0      1   uno   True
1      2   dos  False
2      3  tres   True

Use Dataframe.copy :

temp_df = input_df.copy()

temp_df['archetype_folder'] = temp_df.apply(folder_column, axis=1)

You don't need apply . Use .loc to be more efficient.

temp_df = input_df.copy()
m1 = (input_df['dwelling'] == 5) & (input_df['wall'] == 2)
m2 = (input_df['dwelling'] == 3) & (input_df['wall'] == 4)
temp_df.loc[m1, 'archetype_folder'] = 'folder1'
temp_df.loc[m2, 'archetype_folder'] = 'folder2'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM