[英]Pandas apply function to multiple columns in a list
I'm attempting to create a function that creates a modified dataframe with all outliers removed from the columns stored in my variable num_vars.我正在尝试创建一个 function,它创建一个修改后的 dataframe,并从存储在我的变量 num_vars 中的列中删除所有异常值。 Here is my function so far:
到目前为止,这是我的 function:
def remove_outliers(column):
Q1 = np.percentile(df[column], 25, interpolation = 'midpoint')
Q3 = np.percentile(df[column], 75, interpolation = 'midpoint')
IQR = Q3 - Q1
IQR_mult = IQR * 1.5
ceiling = Q3 + IQR_mult
floor = Q1 - IQR_mult
return df[(df[column] <= ceiling) & (df[column] >= floor)]
The columns I want to apply this function to are stored in我要应用此 function 的列存储在
num_vars = ['host_response_rate', 'accommodates', 'bedrooms', 'beds', 'minimum_nights', 'availability_30', 'number_of_reviews', 'review_scores_rating', 'review_scores_cleanliness', 'review_scores_checkin', 'review_scores_communication', 'review_scores_location', 'review_scores_value', 'time_from_last_review', 'num_amenities', 'price']
The function works when only calling one column such as 'price', but does not return a clean dataframe when calling multiple items at once. function 仅在调用诸如“价格”之类的一列时有效,但在一次调用多个项目时不返回干净的 dataframe。 How can I make it so the function can take all these columns at once, and return a dataframe where all outliers have been removed?
我怎样才能使 function 可以一次获取所有这些列,并返回一个 dataframe 已删除所有异常值?
You can change your remove_outlier function to accept row and column arguments, then iterate through the columns with the function applied to the rows as below:您可以更改您的 remove_outlier function 以接受行和列 arguments,然后使用应用于行的 function 遍历列,如下所示:
def remove_outliers(row,column):
Q1 = np.percentile(df[column], 25, interpolation = 'midpoint')
Q3 = np.percentile(df[column], 75, interpolation = 'midpoint')
IQR = Q3 - Q1
IQR_mult = IQR * 1.5
ceiling = Q3 + IQR_mult
floor = Q1 - IQR_mult
if float(ceiling)>= row[column] >= int(floor):
return row[column]
else:
return None
num_vars = ['host_response_rate', 'accommodates', 'bedrooms', 'beds', 'minimum_nights', 'availability_30',
'number_of_reviews', 'review_scores_rating', 'review_scores_cleanliness', 'review_scores_checkin',
'review_scores_communication', 'review_scores_location', 'review_scores_value', 'time_from_last_review',
'num_amenities', 'price']
for col in num_vars[:3]:
df[col] = df.apply(lambda row: remove_outliers(row,col), axis=1)
df = df.dropna().reset_index(drop=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.