簡體   English   中英

交叉引用數據框以在 python 中提取特定值

[英]Cross referencing dataframes to extract specific values in python

請協助創建python函數。 我有兩個數據框,DF1 和 DF2。 我想在 DF1 中添加一列,DF1['Score'],它基於 DF1 中包含的與 DF2 中的值匹配的值。

DF1:

在此處輸入圖像描述

DF2:

在此處輸入圖像描述

import pandas as pd
DF1 = pd.DataFrame({
     'Age':[25, 54, 33],
     'Income' :[10203, 23822, 84823],
     'Contract Length':[18, 12, 36],
     #'Score':[]
          })

DF2 = pd.DataFrame({
     'variable':['Age', 'Age', 'Age', 'Age',
                 'Income', 'Income', 'Income', 'Income',
                 'Contract Length', 'Contract Length', 'Contract Length', 'Contract Length'],
     'LQ':[ 25, 32.25, 39.5, 46.75, 10203, 28858, 47513, 66168, 12, 18, 24, 30],
     'UQ':[ 32.25, 39.5, 46.75, 54, 28858, 47513, 66168, 84823, 18, 24, 30, 36],
     'Score':[5, 10, 15, 20, 10, 15, 20, 25, 15, 20, 25, 30]
          })

以 DF1 中的客戶 UID 1 為例,他今年 25 歲,收入為 10,203,合同期限為 18; 基於 DF2,我希望能夠為客戶 1 將 30 分添加到 DF1['Score'],計算為 5(對於 25 至 32.5 歲)+ 10(對於收入 10,2013 至 28,858)+ 15(對於合同長度為 12 至 18)。

請協助創建一個 python 函數,為所有客戶添加正確的分數到 DF1['Score']。

您可以使用 pandas pandas.DataFrame.apply遍歷第一個數據幀中的行並從第二個數據幀中獲取匹配條件行。

創建數據

dict1 = {'customer UID': {0: 1, 1: 2, 2: 3}, 'Age': {0: 25, 1: 54, 2: 33}, 'Income': {0: 10203, 1: 23822, 2: 84823}, 'Contract Length': {0: 18, 1: 12, 2: 36}, 'Score': {0: '', 1: '', 2: ''}}

dict2 = {'variable': {0: 'Age', 1: 'Age', 2: 'Age', 3: 'Age', 4: 'Income', 5: 'Income', 6: 'Income', 7: 'Income', 8: 'Contract Length', 9: 'Contract Length', 10: 'Contract Length', 11: 'Contract Length'}, 'LQ': {0: 25.0, 1: 32.25, 2: 39.5, 3: 46.75, 4: 10203.0, 5: 28858.0, 6: 17513.0, 7: 66168.0, 8: 12.0, 9: 18.0, 10: 24.0, 11: 30.0}, 'UQ': {0: 32.25, 1: 39.5, 2: 46.75, 3: 54.0, 4: 28858.0, 5: 47513.0, 6: 66168.0, 7: 84823.0, 8: 18.0, 9: 24.0, 10: 30.0, 11: 36.0}, 'Score': {0: 5, 1: 10, 2: 15, 3: 20, 4: 10, 5: 15, 6: 20, 7: 25, 8: 15, 9: 20, 10: 25, 11: 30}}

df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)

產生輸出

def get_values(row):
    age_condition = (row.Age >= df2['LQ']) & (row.Age <= df2['UQ']) & (df2.variable == 'Age')
    income_condition = (row.Income >= df2['LQ']) & (row.Income <= df2['UQ']) & (df2.variable == 'Income')
    contract_condition = (row['Contract Length'] >= df2['LQ']) & (row['Contract Length'] <= df2['UQ']) & (df2.variable == 'Contract Length')
    return df2[age_condition].Score.values[0] + df2[income_condition].Score.values[0] + df2[contract_condition].Score.values[0]

df1['Score'] = df1.apply(get_values, axis=1)

輸出 :

這給了我們:

df1
   customer UID  Age  Income  Contract Length  Score
0             1   25   10203               18     30
1             2   54   23822               12     45
2             3   33   84823               36     65

為了提高效率,您需要在DF1上執行melt后使用merge_asof

DF1['Score'] = (pd
 .merge_asof(DF1.astype(float).reset_index().melt('index').sort_values(by='value'),
             DF2.sort_values(by='UQ'),
             by='variable', left_on='value', right_on='UQ', direction='forward'
             )
 .groupby('index')['Score'].sum()
)

輸出:

   Age  Income  Contract Length  Score
0   25   10203               18   30.0
1   54   23822               12   45.0
2   33   84823               36   65.0

中間體:

# reshape DF1 to long form
DF1.astype(float).reset_index().melt('index').sort_values(by='value')

   index         variable    value
7      1  Contract Length     12.0
6      0  Contract Length     18.0
0      0              Age     25.0
2      2              Age     33.0
8      2  Contract Length     36.0
1      1              Age     54.0
3      0           Income  10203.0
4      1           Income  23822.0
5      2           Income  84823.0

# merge asof with DF2 (i.e. find the closest UQ value greater that the target

(pd
 .merge_asof(DF1.astype(float).reset_index().melt('index').sort_values(by='value'),
             DF2.sort_values(by='UQ'),
             by='variable', left_on='value', right_on='UQ', direction='forward'
             )
)

   index         variable    value        LQ        UQ  Score
0      1  Contract Length     12.0     12.00     18.00     15
1      0  Contract Length     18.0     12.00     18.00     15
2      0              Age     25.0     25.00     32.25      5
3      2              Age     33.0     32.25     39.50     10
4      2  Contract Length     36.0     30.00     36.00     30
5      1              Age     54.0     46.75     54.00     20
6      0           Income  10203.0  10203.00  28858.00     10
7      1           Income  23822.0  10203.00  28858.00     10
8      2           Income  84823.0  66168.00  84823.00     25

# finally, group by "index" and sum the Score

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM