[英]Randomly selecting k values from n columns of the datafarme for each row and store them into k columns of same dataframe
My datafarme consist of 1M records which have the following format.我的数据场由 1M 条记录组成,其格式如下。
ID SEGMENT group CODE_1 CODE_2 CODE_3 CODE_4 CODE_5 CODE_6 CODE_7 CODE_8 CODE_9 CODE_10
100006 History ML1 Offer_25 Offer_4 Offer_8 Offer_10 Offer_2 Offer_9 Offer_3 Offer_1 Offer_7 Offer_12
100007 History ML1 Offer_35 Offer_4 Offer_18 Offer_10 Offer_22 Offer_9 Offer_3 Offer_1 Offer_7 Offer_12
1000065 History ML1 Offer_5 Offer_40 Offer_8 Offer_1 Offer_21 Offer_9 Offer_3 Offer_1 Offer_7 Offer_13
10001 History ML1 Offer_5 Offer_41 Offer_18 Offer_15 Offer_2 Offer_19 Offer_3 Offer_11 Offer_7 Offer_12
900010 History ML1 Offer_15 Offer_4 Offer_18 Offer_10 Offer_20 Offer_19 Offer_3 Offer_6 Offer_7 Offer_12
Now I want to keep ID, Segment, Group and Code1 to Code4 as it is but want to have just two columns code_5 to Code_6 from rest of the columns where for each row two distict values randomly are derived from the columns values of Code_5 to Code_10 .现在我想保持 ID、Segment、Group 和 Code1 到 Code4 的原样,但希望只有两列 code_5 到 Code_6 来自其余的列,其中每行两个 distict 值随机派生自 Code_5 到 Code_10 的列值.
Which will look like this:看起来像这样:
ID SEGMENT group CODE_1 CODE_2 CODE_3 CODE_4 CODE_5 CODE_6
100006 History ML1 Offer_25 Offer_4 Offer_8 Offer_10 Offer_1 Offer_12
100007 History ML1 Offer_35 Offer_4 Offer_18 Offer_10 Offer_7 Offer_9
1000065 History ML1 Offer_5 Offer_40 Offer_8 Offer_1 Offer_13 Offer_3
10001 History ML1 Offer_5 Offer_41 Offer_18 Offer_15 Offer_2 Offer_19
900010 History ML1 Offer_15 Offer_4 Offer_18 Offer_10 Offer_12 Offer_6
I tried something like this but it is too slow:我试过这样的事情,但它太慢了:
offers_cat = pd.DataFrame([], columns = ['Code_5','Code_6'])
recommend_df_test = recommend_df
number_of_offers = 6
variety_offers = 2
offer_range = number_of_offers - variety_offers
new_df = pd.DataFrame()
for index, row in recommend_df_test.iterrows():
list_append = []
lst_tmp =[]
for i in range (offer_range+1,number_of_offers+5):
offer_code = "CODE_"+str(i)
list_append.append(row[offer_code])
lst_tmp.append(np.random.choice(list_append,size=variety_offers,replace=False))
df_tmp = pd.DataFrame(lst_tmp, columns=offers_cat.columns)
df_tmp["ID"] = row["ID"]
new_df = pd.concat([new_df,df_tmp])
This code gives me new Datafarme having ID and two offers with random value chosen each row from columns 5 to 10.此代码为我提供了新的 Datafarme,它具有 ID 和两个从第 5 列到第 10 列的每行中随机选择的随机值。
Please help me improve the performance请帮助我提高性能
What you need is to apply a row-wise function to one of your columns.您需要的是将逐行函数应用于您的一列。 assuming a data frame like this
假设这样的数据框
df = pandas.DataFrame(
[['a1', 'b1', 'c1'], ['a2', 'b2', 'c2'], ['a3', 'b3', 'c3']],
columns=('A', 'B', 'C')
)
The output would be:输出将是:
A B C
0 a1 b1 c1
1 a2 b2 c2
2 a3 b3 c3
Now you want to replace column A
(or create a new column, doesn't matter) by choosing randomly one out of the other columns values on the same row.现在您想通过从同一行的其他列值中随机选择一个来替换列
A
(或创建一个新列,无关紧要)。 Here is how you do it:这是你如何做到的:
import numpy as np
cols = ['B', 'C']
df.A = df.apply(
lambda r: np.random.choice(r[cols]),
axis=1
)
Here I have used apply
to run a mapping function to all of the data frame.在这里,我使用
apply
对所有数据框运行映射函数。 the axis=1
tells the method to run apply on rows. axis=1
告诉方法在行上运行应用程序。 on lambda
function it takes the row values r
and gives the values of the columns of interest cols=['B','C']
to the random choice function from numpy.在
lambda
函数上,它采用行值r
并将感兴趣的列cols=['B','C']
给来自 numpy 的随机选择函数。 The result would be:结果将是:
A B C
0 b1 b1 c1
1 b2 b2 c2
2 c3 b3 c3
Here's what I would do:这是我会做的:
# for repeatability
np.random.seed(1)
# sampling the columns, 2 for each row
a = np.random.choice(range(5), size=len(df)*2)
# sampling the values given the columns
new_values = df.iloc[:,-5:].values[np.repeat(range(len(df)),2), a].reshape(-1,2)
# creating new data:
pd.concat([df.iloc[:,:-5],
pd.DataFrame(new_values, columns=('Code_5', 'Code_6'))],
axis=1)
Output:输出:
ID SEGMENT group CODE_1 CODE_2 CODE_3 CODE_4 CODE_5 CODE_6
-- ------- ------- --------- -------- -------- -------- -------- -------- -------- --------
0 100006 History ML1 Offer_25 Offer_4 Offer_8 Offer_10 Offer_2 Offer_7 Offer_12
1 100007 History ML1 Offer_35 Offer_4 Offer_18 Offer_10 Offer_22 Offer_9 Offer_3
2 1000065 History ML1 Offer_5 Offer_40 Offer_8 Offer_1 Offer_21 Offer_7 Offer_9
3 10001 History ML1 Offer_5 Offer_41 Offer_18 Offer_15 Offer_2 Offer_19 Offer_3
4 900010 History ML1 Offer_15 Offer_4 Offer_18 Offer_10 Offer_20 Offer_12 Offer_12
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.