[英]How to apply the same function with different input arguments to create new columns in pandas dataframe?
So i've this sample dataframe:所以我有这个示例数据框:
x_mean x_min x_max y_mean y_min y_max
1 85.6 3 264 75.7 3 240
2 105.5 6 243 76.4 3 191
3 95.8 19 287 48.4 8 134
4 85.5 50 166 64.8 32 103
5 55.9 24 117 46.7 19 77
x_range = [list(range(0,50)),list(range(51,100)),list(range(101,250)),list(range(251,350)),list(range(351,430)),list(range(431,1000))]
y_range = [list(range(0,30)),list(range(31,60)),list(range(61,90)),list(range(91,120)),list(range(121,250)),list(range(251,2000))]
#here x = Any column with mean value (eg. x_mean or y_mean)
# y = x_range / y_range
def min_max_range(x,y):
for a in y:
if int(x) in a:
min_val = min(a)
max_val = max(a)+1
return max_val - min_val
def min_range(x,y):
for a in y:
if int(x) in a:
min_val = min(a)
return min_val
Now i want to apply these function min_max_range()
and min_range()
to column x_mean, y_mean
to get new columns.现在我想将这些函数
min_max_range()
和min_range()
应用于列x_mean, y_mean
以获得新列。
Like the function min_max_val
is using column x_mean
& the range x_range
as the input to create column x_min_max_val
, similarly column y_mean
& the range y_range
are used for the column y_min_max_val
:就像函数
min_max_val
使用列x_mean
和范围x_range
作为输入来创建列x_min_max_val
,类似地,列y_mean
和范围y_range
用于列y_min_max_val
:
I can create each column one by one, by using these one liners, but i want to apply this to both column x_mean & y_mean
columns in one go with a one liner.我可以通过使用这些一个衬垫来一一创建每一列,但我想将它同时应用于
x_mean & y_mean
列,并使用一个衬垫。
df['x_min_max_val'] = df['x_mean'].apply(lambda x: min_max_range(x,x_range))
df['y_min_max_val'] = df['y_mean'].apply(lambda x: min_max_range(x,y_range))
The resultant dataframe should look like this:结果数据框应如下所示:
x_mean x_min x_max y_mean y_min y_max x_min_max_val y_min_max_val x_min_val y_min_val
1 85.6 3 264 75.7 3 240 49 29 51 61
2 105.5 6 243 76.4 3 191 149 29 101 91
3 95.8 19 287 48.4 8 134 49 29 51 91
4 85.5 50 166 64.8 32 103 49 29 51 61
5 55.9 24 117 46.7 19 77 49 29 51 31
I want to create these columns in one go, instead of creating one column ata time.我想一次性创建这些列,而不是一次创建一列。 How can i do this?
我怎样才能做到这一点? Any suggestions?
有什么建议? or something like this could work?
或者像这样的东西可以工作吗?
df.filter(regex='mean').apply(lambda x: min_max_range(x,x+'_range'))
This is the concept that you need to follow to make this happen.这是您需要遵循的概念才能实现。 First you need to have your ranges stored in a dictionary to enable access to them through names.
首先,您需要将范围存储在字典中,以便通过名称访问它们。
range_dict = {}
range_dict['x_range'] = x_range
range_dict['y_range'] = y_range
Also, you need to have the columns that you need to do the calculation for in a list (or you can use regex to get those if they have a specific pattern)此外,您需要在列表中包含需要进行计算的列(或者,如果它们具有特定模式,您可以使用正则表达式来获取这些列)
mean_cols_list = ['x_mean', 'y_mean']
Now, to apply your function over all columns, you need to define a function like this现在,要将您的函数应用于所有列,您需要定义这样的函数
def min_max_calculator(df, range_dictionary, mean_columns_list):
for i in range(len(mean_cols_list)):
# this returns 'x_mean'
current_column = mean_cols_list[i]
# this returns 'x_min_max_value'
output_col_name = current_column.replace('mean','min_max_value')
# this returns 'x_range'
range_name = current_column.replace('mean','range')
# this returns the list of ranges for x_range
range_list = range_dict[range_name]
# This add the calculated column to the dataframe
df[output_col_name] = df[current_column].apply(lambda x: min_max_range(x,range_list))
return(df)
df_output = min_max_calculator(df, range_dict, mean_cols_list)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.