简体   繁体   中英

Is there a way to return a pandas dataframe with a modified column?

Say I have a dataframe df with column "age" . Say "age" has some NaN values and I want to create two new dataframes, dfMean and dfMedian , which fill in the NaN values differently. This is the way I would do it:

# Step 1:
dfMean = df
dfMean["age"].fillna(df["age"].mean(),inplace=True)
# Step 2:
dfMedian= df
dfMedian["age"].fillna(df["age"].median(),inplace=True)

I'm curious whether there's a way to do each of these steps in one line instead of two, by returning the modified dataframe without needing to copy the original. But I haven't been able to find anything so far. Thanks, and let me know if I can clarify or if you have a better title in mind for the question:)

Doing dfMean = dfMean["age"].fillna(df["age"].mean()) you create a Series , not a DataFrame .

To add two new Series (=columns) to your DataFrame , use:

df2 = df.assign(age_fill_mean=df["age"].fillna(df["age"].mean()),
                age_fill_median=df["age"].fillna(df["age"].median()),
                )

You alternatively can use alias Pandas.DataFrame.agg()

"Aggregate using one or more operations over the specified axis."

df.agg({'age' : ['mean', 'median']})

No, need 2 times defined new 2 DataFrames by DataFrame.fillna with dictionary for specify columns names for replacement missing values:

dfMean = df.fillna({'age': df["age"].mean()})
dfMedian = df.fillna({'age': df["age"].median()})

One line is:

dfMean,dfMedian=df.fillna({'age': df["age"].mean()}), df.fillna({'age': df["age"].median()})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM