I am trying to finish a Pandas course using Python on DataCamp and got into an issues. I got the solutions but I just want to ask. The quiz is simple: Using a numpy functions on a group of data
This is their suggested tips to complete this small quiz:
.agg() can take in a list of functions. The functions shouldn't be called, so don't use parentheses with them.
This was my code to find min, max, median of weekly_sales
of each store type
sales_stats = sales.groupby("type")["weekly_sales"].agg([np.min(), np.max(), np.mean(), np.median()])
and this is the error:
File "<stdin>", line 4, in mean
TypeError: _mean_dispatcher() missing 1 required positional argument: 'a'
so I changed it to:
sales_stats = sales.groupby("type")["weekly_sales"].agg([np.mean(sales["weekly_sales"]),np.median,np.min,np.max])
but another errors occur, so I look at the solutions:
sales_stats = sales.groupby("type")["weekly_sales"].agg([np.min, np.max, np.mean, np.median])
Does that mean that we don't have to pass any arguments to these numpy methods? and the.agg functions will pass the "weekly_sales" as an argument to every of them? If so, if I want to pass two arguments to these methods, for example monthly_sales
Is this a right way?
sales_stats = sales.groupby("type")["weekly_sales","monthly_sales"].agg([np.min, np.max, np.mean, np.median])
You're very close, but the correct syntax would be:
sales_stats = (
sales.groupby("type")[["weekly_sales","monthly_sales"]]
.agg([np.min, np.max, np.mean, np.median])
)
This is because, selecting multiple columns from a DataFrame
or in this case a Groupby
object, requires a list of column names. This snippet will return the minimum, maximum, mean, and median of both the "weekly_sales" and "monthly_sales" columns- groupby by "type".
Does that mean that we don't have to pass any arguments to these numpy methods? and the.agg functions will pass the "weekly_sales" as an argument to every of them? If so, if I want to pass two arguments to these methods, for example monthly_sales Is this a right way?
The arguments (each sub-array in this case) are passed under the hood by pandas to the aggregating functions.
If you want some more fine-grained control, you can pass a dictionary like so:
sales_stats = (
sales.groupby("type")
.agg({
"weekly_sales": np.mean,
"monthly_sales": [np.min, np.max]
})
)
This will return the mean of "weekly_sales" as well as the min & max of "monthly_sales". Check out some of the examples from the [
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.